[Python-Dev] os.path.walk() lacks 'depth first' option
Mon, 21 Apr 2003 11:55:31 -0700
Guido>> This idea has merit, although I'm not sure I'd call this depth first;
Guido>> it's more a matter of pre-order vs. post-order, isn't it?
I thought the names were synonymous, but a quick look on Google
showed that post-order seems more specific to binary trees whereas
depth first is more general, but I didn't look very hard and all my
college text books are in storage :-) Depth first is more intuitive, but
post order is more descriptive of what the algorithm does.
If I were writing documentation (or reading it) I would prefer "depth first".
Guido>> - How often does one need this?
I write these little directory/file filters quite often. I have come across
this problem of renaming the directories you are traversing before.
In the past the trees were small, so I just renamed the directories by
hand and used os.path.walk() to handle the files. Recently I had to rename
a very large tree which prompted me to look for a better solution.
Guido>> - When needed, how hard is it to hand-code a directory walk? It's not
Guido>> like the body of the walk() function is rocket science.
True, it is easy to write. It would make a good exercise for a beginner, but
I think it's better to have it than to not have it since I think a big
part of the appear of Python is the "little" algorithms.
It's also fits with the Python Batteries Included philosophy and
benefits the "casual" Python user. Finally, I just find it generally useful.
I use directory walkers a lot.
david>> That's hardly the point of improving the standard library, though, is
david>> it? I'm all for putting the kitchen sink in there, especially if it
david>> originates with a use case ("I had some dishes to wash..." ;-)
Guido>But if I had to do it over again, I wouldn't have added walk() in the
Guido>current form. I often find it harder to fit a particular program's
Guido>needs in the API offered by walk() than it is to reimplement the walk
Guido>myself. That's why I'm concerned about adding to it.
The change is small and the interface is backward compatible, but
if you are actually trying to discourage people from using os.path.walk()
in the future then I would vote for deprecating it and
replacing it with a generator where the default is depthfirst ;-)
Below is a sample tree walker using a generator
I was delighted to find that they work in recursive functions, but
it gave me a headache to think about for the first time.
Perhaps it could be prettier, but this demonstrates the basic idea.
# Inspired by Doug Fort from an ActiveState Python recipe.
# His version didn't use recursion and didn't do depth first.
def walktree (top = ".", depthfirst = True):
"""This walks a directory tree, starting from the 'top' directory.
This is somewhat like os.path.walk, but using generators
instead of a visit function. One important difference is that
walktree() defaults to DEPTH first with optional BREADTH first,
whereas the os.path.walk function allows only BREADTH first.
Depth first was made the default because it is safer if
you are going to be modifying the directory names you visit.
This avoids the problem of renaming a directory before visiting
the children of that directory.
names = os.listdir(top)
if not depthfirst:
yield top, names
for name in names:
st = os.lstat(os.path.join(top, name))
for (newtop, children) in walktree (os.path.join(top, name), depthfirst):
yield newtop, children
yield top, names
for (basepath, children) in walktree():
for child in children:
print os.path.join(basepath, child)
if __name__ == '__main__':