[Python-Dev] os.path.walk() lacks 'depth first' option

Tue, 22 Apr 2003 01:32:21 -0700

Tim>The callback can rename b and e, and change the contents of the fnames list
Tim>to ["bx", "ex"] so that walk will find the renamed directories.  Etc.

Ha! This is sweet, but I would call this solution "nonobvious".
But perhaps it is a good argument for not modifying os.path.walk(), yet
should a walktree generator be included in Python's future I hope that
it will have the explicit option for postorder depth first.

Tim> Sorry, I'm unmovable on this point.  My typical uses for this function do
Tim> have to separates dirs from non-dirs, walk() has to make the distinction
Tim> *anyway* (for its internal use), and it's expensive for the client to do the
Tim> join() and isdir() bits all over again (isdir() is a filesystem op, and at
Tim> least on my box repeated isdir() is overwhelmingly more costly than
Tim> partitioning or joining a Python list).

I'm probably less adamant on this point than you :-)
And you are right, it's cheaper for me to simply run through both lists
than it would be to loop over a conditional based on isdir().

Tim> What about that worries you?  I don't like it because I have some
Tim> directories with many thousands of files, and stuffing a long redundant path
Tim> at the start of each is wasteful in the abstract.  I'm not sure it really
Tim> matters, though -- e.g., 10K files in a directory * 200 redundant chars each
Tim> = a measly 2 megabytes wasted <wink>.

That was also what bothered me ;-) I guess it's more of a habit than necessity.

Tim> Not all Python platforms have symlinks, of course.  The traditional answer

True, but checking a file with os.path.islink() should be safe even
on platforms without links -- if the docs are to be believed.
The docs says that platforms that don't support links will always return
False for islink(). The Python docs are a little inconsistent on links.
1. os.path.islink(path) claims to be only check links on UNIX and
always false if symbolic links are not supported.
2. os.readlink(path) is only available on UNIX and is not defined on Windows.
3. os.path.realpath(path) claims to be only available on UNIX, but it
is actually defined and returns the given path if you call it on Windows.

Tim> I'm finding you too hard to follow here, because your use of "depthfirst"
Tim> and "breadthfirst" doesn't follow normal usage of the terms.  Here's normal

You are right. I will stop calling it Breadth First now. Feel free to dope slap me.

This confusion on my part was due to the apparent order when one
prints the elements of the names list when the visit function
is called. It would print B, C, D, E, F, G, H, I, J, K, but
that's the parent printing the children, not the children printing themselves
as they are visited. Oh... (a small, dim light clicks on.)

Still, walktree should have the option to hit the bottom of a branch and
then process on it's way back up (post-order).

OK, how is this following version?

Yours,
Noah

from __future__ import generators # needed for Python 2.2
import os

def walktree (basepath=".", postorder=True, ignorelinks=True):
     """This walks a directory tree, starting from the basepath directory.
     This is somewhat like os.path.walk, but using generators
     instead of a visit function. One important difference is that
     walktree() defaults to postorder with optional preorder,
     whereas the os.path.walk function allows only preorder.
     Postorder was made the default because it is safer if
     you are going to be modifying the directory names you visit.
     This avoids the problem of renaming a directory before visiting
     the children of that directory.

     The ignorelinks option determines whether to follow symbolic links.
     Some symbolic links can lead to recursive traversal cycles.
     A better way would be to detect and prune cycles.
     """
     children = os.listdir(basepath)

     dirs, nondirs = [], []

     for name in children:
         fullpath = os.path.join (basepath, name)
         if os.path.isdir (fullpath) and not (ignorelinks and os.path.islink(fullpath)):
             dirs.append(name)
         else:
             nondirs.append(name)

     if not postorder:
         yield basepath, dirs, nondirs

     for name in dirs:
         for next_branch in walktree (os.path.join(basepath, name), postorder, ignorelinks):
             yield next_branch

     if postorder:
         yield basepath, dirs, nondirs

def test():
     for basepath, dirs, nondirs in walktree():
         for name in dirs:
             print os.path.join(basepath, name)
         for name in nondirs:
             print os.path.join(basepath, name)

if __name__ == '__main__':
         test()