[Python-Dev] os.path.walk() lacks 'depth first' option

Noah Spurrier noah@noah.org
Mon, 21 Apr 2003 11:55:31 -0700


Guido>> This idea has merit, although I'm not sure I'd call this depth first;
Guido>> it's more a matter of pre-order vs. post-order, isn't it?
I thought the names were synonymous, but a quick look on Google
showed that post-order seems more specific to binary trees whereas
depth first is more general, but I didn't look very hard and all my
college text books are in storage :-) Depth first is more intuitive, but
post order is more descriptive of what the algorithm does.
If I were writing documentation (or reading it) I would prefer "depth first".

Guido>> - How often does one need this?
I write these little directory/file filters quite often. I have come across
this problem of renaming the directories you are traversing before.
In the past the trees were small, so I just renamed the directories by
hand and used os.path.walk() to handle the files. Recently I had to rename
a very large tree which prompted me to look for a better solution.

Guido>> - When needed, how hard is it to hand-code a directory walk?  It's not
Guido>>   like the body of the walk() function is rocket science.
True, it is easy to write. It would make a good exercise for a beginner, but
I think it's better to have it than to not have it since I think a big
part of the appear of Python is the "little" algorithms.
It's also fits with the Python Batteries Included philosophy and
benefits the "casual" Python user. Finally, I just find it generally useful.
I use directory walkers a lot.

david>> That's hardly the point of improving the standard library, though, is
david>> it?  I'm all for putting the kitchen sink in there, especially if it
david>> originates with a use case ("I had some dishes to wash..." ;-)
Guido>
Guido>But if I had to do it over again, I wouldn't have added walk() in the
Guido>current form.  I often find it harder to fit a particular program's
Guido>needs in the API offered by walk() than it is to reimplement the walk
Guido>myself.  That's why I'm concerned about adding to it.

The change is small and the interface is backward compatible, but
if you are actually trying to discourage people from using os.path.walk()
in the future then I would vote for deprecating it and
replacing it with a generator where the default is depthfirst ;-)

Below is a sample tree walker using a generator
I was delighted to find that they work in recursive functions, but
it gave me a headache to think about for the first time.
Perhaps it could be prettier, but this demonstrates the basic idea.

Yours,
Noah


# Inspired by Doug Fort from an ActiveState Python recipe.
# His version didn't use recursion and didn't do depth first.
import os
import stat

def walktree (top = ".", depthfirst = True):
     """This walks a directory tree, starting from the 'top' directory.
     This is somewhat like os.path.walk, but using generators
     instead of a visit function. One important difference is that
     walktree() defaults to DEPTH first with optional BREADTH first,
     whereas the os.path.walk function allows only BREADTH first.
     Depth first was made the default because it is safer if
     you are going to be modifying the directory names you visit.
     This avoids the problem of renaming a directory before visiting
     the children of that directory.
     """
     names = os.listdir(top)

     if not depthfirst:
         yield top, names

     for name in names:
         try:
             st = os.lstat(os.path.join(top, name))
         except os.error:
             continue
         if stat.S_ISDIR(st.st_mode):
             for (newtop, children) in walktree (os.path.join(top, name), depthfirst):
                 yield newtop, children

     if depthfirst:
         yield top, names

def test():
     for (basepath, children) in walktree():
         for child in children:
	    print os.path.join(basepath, child)

if __name__ == '__main__':
     test()