[Python-Dev] os.path.walk() lacks 'depth first' option

Noah Spurrier noah@noah.org
Mon, 21 Apr 2003 15:15:09 -0700


I like your version; although I used a different name to avoid
confusion with os.path.walk.

Note that os.listdir does not include the special entries '.' and '..'
even if they are present in the directory, so there is no need
to remove them.

Tim Peters>    for path in walk(root):
Tim Peters>Or it could look like
Tim Peters>    for top, names in walk(root):
Tim Peters>or
Tim Peters>    for top, dirnames, nondirnames in walk(root):

I like the idea of yielding (top, dirs, nondirs), but
often I want to perform the same operations on both
dirs and nondirs so separating them doesn't help that case.
This seems to be a situation where there is no typical case,
so my preference is for the simpler interface.
It also eliminates the need to build two new lists from
the list you get from os.listdir()... In fact, I prefer
your first suggestion (for path in walk(root)), but that
would require building a new list by prepending the
basepath to each element of children because os.listdir does not
return full path. So finally in this example, I just went with
returning the basepath and the children (both files and directories).

Following Tom Good's example I added an option to
ignore symbolic links. It would be better to detect
cycles or at least prevent going higher up in the tree.

Tim Peters>obvious topdown argument, note a subtlety:  when topdown is True, the caller
Tim Peters>can prune the search by mutating the dirs list yielded to it.  For example,
This example still allows you to prune the search
in Breadth first mode by removing elements from
the children list. That is cool.
     for top, children in walk('C:/code/python', depthfirst=False):
         print top, children
         if 'CVS' in children:
             children.remove('CVS')

Yours,
Noah

from __future__ import generators # needed for Python 2.2
# Inspired by Doug Fort from an ActiveState Python recipe.
# His version didn't use recursion and didn't do depth first.
import os

def walktree (basepath=".", depthfirst=True, ignorelinks=True):
     """This walks a directory tree, starting from the basepath directory.
     This is somewhat like os.path.walk, but using generators
     instead of a visit function. One important difference is that
     walktree() defaults to DEPTH first with optional BREADTH first,
     whereas the os.path.walk function allows only BREADTH first.
     Depth first was made the default because it is safer if
     you are going to be modifying the directory names you visit.
     This avoids the problem of renaming a directory before visiting
     the children of that directory.

     The ignorelinks option determines whether to follow symbolic links.
     Some symbolic links can lead to recursive traversal cycles.
     A better way would be to detect and prune cycles.
     """
     children = os.listdir(basepath)

     if not depthfirst:
         yield basepath, children

     for name in children:
         fullpath = os.path.join(basepath, name)
         if os.path.isdir (fullpath) and not (ignorelinks and os.path.islink(fullpath)):
             for (next_basepath, next_children) in walktree (fullpath, depthfirst, ignorelinks):
                 yield next_basepath, next_children

     if depthfirst:
         yield basepath, children

def test():
     for (basepath, children) in walktree():
         for name in children:
             print os.path.join(basepath, name)

if __name__ == '__main__':
     test()