os.path.walk() lacks 'depth first' option
data:image/s3,"s3://crabby-images/f5086/f5086b20c25bc098201ddedc582bb2a5be63fe3d" alt=""
Hello, Recently I realized that there is no easy way to walk a directory tree and rename each directory and file. The standard os.path.walk() function does a breadth first walk. This makes it hard to write scripts that modify directory names as they walk the tree because you need to visit subdirectories before you rename their parents. What is really needed is a depth first walk. For example this naive code would not work with breadth first walk: """Renames all directories and files to lower case.""" import os.path def visit (arg, dirname, names): for name in names: print os.path.join (dirname, name) oldname = os.path.join (dirname, name) newname = os.path.join (dirname, name.lower()) os.rename (oldname, newname) os.path.walk ('.', visit, None) The library source posixpath.py defined os.path.walk on my system. A comment in that file mentions that the visit function may modify the filenames list to impose a different order of visiting, but this is not possible as far as I can tell. Perhaps future versions of Python could include an option to do a depth first walk instead of the default breadth first. Modifying os.path.walk() to allow for optional depth first walking is simple. I have attached a patch to posixpath.py that demonstrates this. This adds an if conditional at the beginning and end of the walk() function. I have not checked to see if other platforms share the posixpath.py module this for the walk() function, but if there is interest then I'd be happy to cross reference this. Yours, Noah *** posixpath.py 2003-04-19 22:26:08.000000000 -0700 --- posixpath_walk_depthfirst.py 2003-04-19 22:12:48.000000000 -0700 *************** *** 259,265 **** # The func may modify the filenames list, to implement a filter, # or to impose a different order of visiting. ! def walk(top, func, arg): """Directory tree walk with callback function. For each directory in the directory tree rooted at top (including top --- 259,265 ---- # The func may modify the filenames list, to implement a filter, # or to impose a different order of visiting. ! def walk(top, func, arg, depthfirst=False): """Directory tree walk with callback function. For each directory in the directory tree rooted at top (including top *************** *** 272,284 **** order of visiting. No semantics are defined for, or required of, arg, beyond that arg is always passed to func. It can be used, e.g., to pass a filename pattern, or a mutable object designed to accumulate ! statistics. Passing None for arg is common.""" try: names = os.listdir(top) except os.error: return ! func(arg, top, names) for name in names: name = join(top, name) try: --- 272,287 ---- order of visiting. No semantics are defined for, or required of, arg, beyond that arg is always passed to func. It can be used, e.g., to pass a filename pattern, or a mutable object designed to accumulate ! statistics. Passing None for arg is common. The optional depthfirst ! argument may be set to True to walk the directory tree depth first. ! The default is False (walk breadth first).""" try: names = os.listdir(top) except os.error: return ! if not depthfirst: ! func(arg, top, names) for name in names: name = join(top, name) try: *************** *** 287,293 **** continue if stat.S_ISDIR(st.st_mode): walk(name, func, arg) ! # Expand paths beginning with '~' or '~user'. # '~' means $HOME; '~user' means that user's home directory. --- 290,297 ---- continue if stat.S_ISDIR(st.st_mode): walk(name, func, arg) ! if depthfirst: ! func(arg, top, names) # Expand paths beginning with '~' or '~user'. # '~' means $HOME; '~user' means that user's home directory. *************** *** 416,420 **** return filename supports_unicode_filenames = False - - --- 420,422 ----
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
This idea has merit, although I'm not sure I'd call this depth first; it's more a matter of pre-order vs. post-order, isn't it? But I ask two questions: - How often does one need this? - When needed, how hard is it to hand-code a directory walk? It's not like the body of the walk() function is rocket science. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
But if I had to do it over again, I wouldn't have added walk() in the current form. I often find it harder to fit a particular program's needs in the API offered by walk() than it is to reimplement the walk myself. That's why I'm concerned about adding to it. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Absolutely! So let's try to write something new based on generators, make it flexible enough so that it can handle pre-order or post-order visits, and then phase out os.walk(). --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/e88a6/e88a6d57abf46790782357b4e08a5f8aa28e22e4" alt=""
[Guido]
But if I had to do it over again, I wouldn't have added walk() in the current form.
[Neil Schemenauer]
I think it's the perfect place for a generator.
[Guido]
I posted one last night, with a bug (it failed to pass the topdown flag through to recursive calls). Here's that again, with the bug repaired, sped up some, and with a docstring. Double duty: the example in the docstring shows why we don't want to make a special case out of sum([]): empty lists can arise naturally. What else would people like in this? I really like separating the directory names from the plain-file names, so don't bother griping about that <wink>. It's at least as fast as the current os.path.walk() (it's generally faster for me, but times for this are extremely variable on Win98). Removing the internal recursion doesn't appear to make a measureable difference when walking my Python tree, although because recursive generators require time proportional to the current stack depth to deliver a result to the caller, and to resume again, removing recursion could be much more efficient on an extremely deep tree. The biggest speedup I could find on Windows was via using os.chdir() liberally, so that os.path.join() calls weren't needed, and os.path.isdir() calls worked directly on one-component names. I suspect this has to do with that Win98 doesn't have an effective way to cache directory lookups under the covers. Even so, it only amounted to a 10% speedup: directory walking is plain slow on Win98 no matter how you do it. The attached doesn't play any gross speed tricks.
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Good enough for me. :-)
Please don't us chdir(), no matter how much it speeds things up. It's a disaster in a multi-threaded program. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/8acff/8acff8df3a058787867f7329e81eaa107891f153" alt=""
Guido van Rossum wrote:
Has anybody considered Jason Orendorff's path module (http://www.jorendorff.com/articles/python/path/) for inclusion in the standard library? It has a path walking generator and much, much more.
This new generator should probably support callbacks that determine whether directories should be entered or not. Bye, Walter Dörwald
data:image/s3,"s3://crabby-images/e88a6/e88a6d57abf46790782357b4e08a5f8aa28e22e4" alt=""
[Guido]
We also have another possibility now: a pathname generator. Then the funky callback and mystery-arg ("what's the purpose of the 'arg' arg?" is a semi-FAQ on c.l.py) bits can go away, and client code could look like: for path in walk(root): # filter, if you like, via 'if whatever: continue' # accumulate state, if you like, in local vars Or it could look like for top, names in walk(root): or for top, dirnames, nondirnames in walk(root): Here's an implementation of the last flavor. Besides the more-or-less obvious topdown argument, note a subtlety: when topdown is True, the caller can prune the search by mutating the dirs list yielded to it. For example, for top, dirs, nondirs in walk('C:/code/python'): print top, dirs, len(nondirs) if 'CVS' in dirs: dirs.remove('CVS') doesn't descend into CVS subdirectories. def walk(top, topdown=True): import os try: names = os.listdir(top) except os.error: return exceptions = ('.', '..') dirs, nondirs = [], [] for name in names: if name in exceptions: continue fullname = os.path.join(top, name) if os.path.isdir(fullname): dirs.append(name) else: nondirs.append(name) if topdown: yield top, dirs, nondirs for name in dirs: for x in walk(os.path.join(top, name)): yield x if not topdown: yield top, dirs, nondirs
data:image/s3,"s3://crabby-images/3d286/3d28638e14234e232e12b6f598f6621516f8adb0" alt=""
On Sunday 20 April 2003 10:12 pm, Tim Peters wrote:
if 'CVS' in dirs: dirs.remove('CVS')
This code brought up an interesting question to me: if sets have a .discard method that removes an element without raising KeyError if the element isn't in the set, should lists perhaps have that same method? On another related front, sets (in my Python 2.3a2) raise KeyError on a .remove(elt) when elt isn't in the set. Since sets aren't mappings, should that be a ValueError (like list raises) instead? Jeremy
data:image/s3,"s3://crabby-images/e88a6/e88a6d57abf46790782357b4e08a5f8aa28e22e4" alt=""
[Jeremy Fincher]
I don't think list.remove(x) is used enough to care, when the presence of x in the list is unknown. Adding methods for purity alone is neither Pythonic nor Perlish <wink>.
Since sets aren't sequences either, why should sets raise the same exception lists raise? It's up to the type to use whichever fool exceptions it chooses. This doesn't always make life easy for users, alas -- there's not much consistency in exception behavior across packages. In this case, a user would be wise to avoid expecting IndexError or KeyError, and catch their common base class (LookupError) instead. The distinction between IndexError and KeyError isn't really useful (IMO; LookupError was injected as a base class recently in Python's life).
data:image/s3,"s3://crabby-images/4c5e0/4c5e094efaa72edc3f091be11b2a2b05a33dd2b6" alt=""
Tim Peters <tim.one@comcast.net> writes:
I've wished for this, more than once, in the past. I can't quite remember why, I have to admit. while x in seq: seq.remove(x) is vulgar, on at least two levels. For all that, I'm not sure this is worth the pain.
Without me noticing, too! Well, I knew there was a lookup error that you get when failing to find a codec, but I didn't know IndexError and KeyError derived from it... Also note that Jeremy was suggesting *ValueError*, not IndexError... that any kind of index-or-key-ing is going on is trivia of the implementation, surely? Cheers, M. -- First of all, email me your AOL password as a security measure. You may find that won't be able to connect to the 'net for a while. This is normal. The next thing to do is turn your computer upside down and shake it to reboot it. -- Darren Tucker, asr
data:image/s3,"s3://crabby-images/e88a6/e88a6d57abf46790782357b4e08a5f8aa28e22e4" alt=""
[Jeremy Fincher]
[Tim]
[Michael Hudson]
Oops! So he was -- I spaced out on that.
that any kind of index-or-key-ing is going on is trivia of the implementation, surely?
Sure. I don't care for ValueError in this context, though -- there's nothing wrong with the value I'm testing for set membership, after all. Of course I never cared for ValueError on a failing list.remove() either. I like ValueError best when an input is of the right type but outside the defined domain of a function, like math.sqrt(-1.0) or chr(500). Failing to find something feels more like a (possibly proper subclass of) LookupError to me. But I'd hate to create even more useless distinctions among different kinds of lookup failures, so am vaguely happy reusing the KeyError flavor of LookupError. In any case, I'm not unhappy enough with it to do something about it. I nevertheless agree Jerry raised a good point, and maybe somebody else is unhappy enough with it to change it?
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Yeah, [].remove(42) raising ValueError is a bit weird. It was put in before we had the concept of LookupError, and the rationale for using ValueError was that the *value* is not found -- can't use IndexError because the value is chosen from a different set than the index, can't use KeyError because lists don't have a concept of key. In retrospect, it would have been better to define a SearchError, subclassing LookupError. OTOH there's something to say for fewer errors, not more; e.g. sometimes I wish AttributeError and TypeError were unified, because AttributeError usually means that an object isn't of the expected type. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/f125b/f125b51aa2c9d22fc221b9f35c9f4a5d4eb69924" alt=""
[Guido van Rossum]
I do not much use `os.path.walk' myself. It is so simple to write a small walking loop with a stack of unseen directories, and in practice, there is a wide range of ways and reasons to walk a directory hierarchy, some of which do not fit nicely in the current `os.path.walk' specifications.
That's why I'm concerned about adding to it.
The addition of generators to Python also changed the picture somewhat, in this area. It is often convenient to use a generator for a particular walk. -- François Pinard http://www.iro.umontreal.ca/~pinard
data:image/s3,"s3://crabby-images/f5086/f5086b20c25bc098201ddedc582bb2a5be63fe3d" alt=""
Guido>> This idea has merit, although I'm not sure I'd call this depth first; Guido>> it's more a matter of pre-order vs. post-order, isn't it? I thought the names were synonymous, but a quick look on Google showed that post-order seems more specific to binary trees whereas depth first is more general, but I didn't look very hard and all my college text books are in storage :-) Depth first is more intuitive, but post order is more descriptive of what the algorithm does. If I were writing documentation (or reading it) I would prefer "depth first". Guido>> - How often does one need this? I write these little directory/file filters quite often. I have come across this problem of renaming the directories you are traversing before. In the past the trees were small, so I just renamed the directories by hand and used os.path.walk() to handle the files. Recently I had to rename a very large tree which prompted me to look for a better solution. Guido>> - When needed, how hard is it to hand-code a directory walk? It's not Guido>> like the body of the walk() function is rocket science. True, it is easy to write. It would make a good exercise for a beginner, but I think it's better to have it than to not have it since I think a big part of the appear of Python is the "little" algorithms. It's also fits with the Python Batteries Included philosophy and benefits the "casual" Python user. Finally, I just find it generally useful. I use directory walkers a lot. david>> That's hardly the point of improving the standard library, though, is david>> it? I'm all for putting the kitchen sink in there, especially if it david>> originates with a use case ("I had some dishes to wash..." ;-) Guido> Guido>But if I had to do it over again, I wouldn't have added walk() in the Guido>current form. I often find it harder to fit a particular program's Guido>needs in the API offered by walk() than it is to reimplement the walk Guido>myself. That's why I'm concerned about adding to it. The change is small and the interface is backward compatible, but if you are actually trying to discourage people from using os.path.walk() in the future then I would vote for deprecating it and replacing it with a generator where the default is depthfirst ;-) Below is a sample tree walker using a generator I was delighted to find that they work in recursive functions, but it gave me a headache to think about for the first time. Perhaps it could be prettier, but this demonstrates the basic idea. Yours, Noah # Inspired by Doug Fort from an ActiveState Python recipe. # His version didn't use recursion and didn't do depth first. import os import stat def walktree (top = ".", depthfirst = True): """This walks a directory tree, starting from the 'top' directory. This is somewhat like os.path.walk, but using generators instead of a visit function. One important difference is that walktree() defaults to DEPTH first with optional BREADTH first, whereas the os.path.walk function allows only BREADTH first. Depth first was made the default because it is safer if you are going to be modifying the directory names you visit. This avoids the problem of renaming a directory before visiting the children of that directory. """ names = os.listdir(top) if not depthfirst: yield top, names for name in names: try: st = os.lstat(os.path.join(top, name)) except os.error: continue if stat.S_ISDIR(st.st_mode): for (newtop, children) in walktree (os.path.join(top, name), depthfirst): yield newtop, children if depthfirst: yield top, names def test(): for (basepath, children) in walktree(): for child in children: print os.path.join(basepath, child) if __name__ == '__main__': test()
data:image/s3,"s3://crabby-images/58a0b/58a0be886f0375938476d3eb7345a8b9d8cdc91e" alt=""
Noah Spurrier wrote:
I'm tempted to declare this off-topic: depth-first means "traverse children before traversing siblings". Depth-first comes in three variations: pre-order (traverse node first, then children, then siblings), in-order (only for binary trees: traverse left child first, then node, then right child, then sibling), post-order (traverse children first, then node, then siblings). There is also breadth-first: traverse siblings first, then children.
I write these little directory/file filters quite often. I have come across this problem of renaming the directories you are traversing before.
I still can't understand why you can't use os.path.walk for that. Did you know that you can modify the list that is passed to the callback, and that walk will continue to visit the elements in the list? Regards, Martin
data:image/s3,"s3://crabby-images/e88a6/e88a6d57abf46790782357b4e08a5f8aa28e22e4" alt=""
[Noah Spurrier]
[Martin v. Löwis]
Let's spell it out. Say the directory structure is like so: a/ b/ c/ d/ e/ and we want to stick "x" at the end of each directory name. The first thing the callback sees is arg, "a", ["b", "e"] The callback can rename b and e, and change the contents of the fnames list to ["bx", "ex"] so that walk will find the renamed directories. Etc. This works: """ import os def renamer(arg, dirname, fnames): for i, name in enumerate(fnames): if os.path.isdir(os.path.join(dirname, name)): newname = name + "x" os.rename(os.path.join(dirname, name), os.path.join(dirname, newname)) fnames[i] = newname # crucial! os.path.walk('a', renamer, None) """ It's certainly less bother renaming bottom-up; this works too (given the last walk() generator implementation I posted): """ import os for root, dirs, files in walk('a', topdown=False): for d in dirs: os.rename(os.path.join(root, d), os.path.join(root, d + 'x')) """ A possible surprise is that neither of these renames 'a'.
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
This idea has merit, although I'm not sure I'd call this depth first; it's more a matter of pre-order vs. post-order, isn't it? But I ask two questions: - How often does one need this? - When needed, how hard is it to hand-code a directory walk? It's not like the body of the walk() function is rocket science. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
But if I had to do it over again, I wouldn't have added walk() in the current form. I often find it harder to fit a particular program's needs in the API offered by walk() than it is to reimplement the walk myself. That's why I'm concerned about adding to it. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Absolutely! So let's try to write something new based on generators, make it flexible enough so that it can handle pre-order or post-order visits, and then phase out os.walk(). --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/e88a6/e88a6d57abf46790782357b4e08a5f8aa28e22e4" alt=""
[Guido]
But if I had to do it over again, I wouldn't have added walk() in the current form.
[Neil Schemenauer]
I think it's the perfect place for a generator.
[Guido]
I posted one last night, with a bug (it failed to pass the topdown flag through to recursive calls). Here's that again, with the bug repaired, sped up some, and with a docstring. Double duty: the example in the docstring shows why we don't want to make a special case out of sum([]): empty lists can arise naturally. What else would people like in this? I really like separating the directory names from the plain-file names, so don't bother griping about that <wink>. It's at least as fast as the current os.path.walk() (it's generally faster for me, but times for this are extremely variable on Win98). Removing the internal recursion doesn't appear to make a measureable difference when walking my Python tree, although because recursive generators require time proportional to the current stack depth to deliver a result to the caller, and to resume again, removing recursion could be much more efficient on an extremely deep tree. The biggest speedup I could find on Windows was via using os.chdir() liberally, so that os.path.join() calls weren't needed, and os.path.isdir() calls worked directly on one-component names. I suspect this has to do with that Win98 doesn't have an effective way to cache directory lookups under the covers. Even so, it only amounted to a 10% speedup: directory walking is plain slow on Win98 no matter how you do it. The attached doesn't play any gross speed tricks.
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Good enough for me. :-)
Please don't us chdir(), no matter how much it speeds things up. It's a disaster in a multi-threaded program. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/8acff/8acff8df3a058787867f7329e81eaa107891f153" alt=""
Guido van Rossum wrote:
Has anybody considered Jason Orendorff's path module (http://www.jorendorff.com/articles/python/path/) for inclusion in the standard library? It has a path walking generator and much, much more.
This new generator should probably support callbacks that determine whether directories should be entered or not. Bye, Walter Dörwald
data:image/s3,"s3://crabby-images/e88a6/e88a6d57abf46790782357b4e08a5f8aa28e22e4" alt=""
[Guido]
We also have another possibility now: a pathname generator. Then the funky callback and mystery-arg ("what's the purpose of the 'arg' arg?" is a semi-FAQ on c.l.py) bits can go away, and client code could look like: for path in walk(root): # filter, if you like, via 'if whatever: continue' # accumulate state, if you like, in local vars Or it could look like for top, names in walk(root): or for top, dirnames, nondirnames in walk(root): Here's an implementation of the last flavor. Besides the more-or-less obvious topdown argument, note a subtlety: when topdown is True, the caller can prune the search by mutating the dirs list yielded to it. For example, for top, dirs, nondirs in walk('C:/code/python'): print top, dirs, len(nondirs) if 'CVS' in dirs: dirs.remove('CVS') doesn't descend into CVS subdirectories. def walk(top, topdown=True): import os try: names = os.listdir(top) except os.error: return exceptions = ('.', '..') dirs, nondirs = [], [] for name in names: if name in exceptions: continue fullname = os.path.join(top, name) if os.path.isdir(fullname): dirs.append(name) else: nondirs.append(name) if topdown: yield top, dirs, nondirs for name in dirs: for x in walk(os.path.join(top, name)): yield x if not topdown: yield top, dirs, nondirs
data:image/s3,"s3://crabby-images/3d286/3d28638e14234e232e12b6f598f6621516f8adb0" alt=""
On Sunday 20 April 2003 10:12 pm, Tim Peters wrote:
if 'CVS' in dirs: dirs.remove('CVS')
This code brought up an interesting question to me: if sets have a .discard method that removes an element without raising KeyError if the element isn't in the set, should lists perhaps have that same method? On another related front, sets (in my Python 2.3a2) raise KeyError on a .remove(elt) when elt isn't in the set. Since sets aren't mappings, should that be a ValueError (like list raises) instead? Jeremy
data:image/s3,"s3://crabby-images/e88a6/e88a6d57abf46790782357b4e08a5f8aa28e22e4" alt=""
[Jeremy Fincher]
I don't think list.remove(x) is used enough to care, when the presence of x in the list is unknown. Adding methods for purity alone is neither Pythonic nor Perlish <wink>.
Since sets aren't sequences either, why should sets raise the same exception lists raise? It's up to the type to use whichever fool exceptions it chooses. This doesn't always make life easy for users, alas -- there's not much consistency in exception behavior across packages. In this case, a user would be wise to avoid expecting IndexError or KeyError, and catch their common base class (LookupError) instead. The distinction between IndexError and KeyError isn't really useful (IMO; LookupError was injected as a base class recently in Python's life).
data:image/s3,"s3://crabby-images/4c5e0/4c5e094efaa72edc3f091be11b2a2b05a33dd2b6" alt=""
Tim Peters <tim.one@comcast.net> writes:
I've wished for this, more than once, in the past. I can't quite remember why, I have to admit. while x in seq: seq.remove(x) is vulgar, on at least two levels. For all that, I'm not sure this is worth the pain.
Without me noticing, too! Well, I knew there was a lookup error that you get when failing to find a codec, but I didn't know IndexError and KeyError derived from it... Also note that Jeremy was suggesting *ValueError*, not IndexError... that any kind of index-or-key-ing is going on is trivia of the implementation, surely? Cheers, M. -- First of all, email me your AOL password as a security measure. You may find that won't be able to connect to the 'net for a while. This is normal. The next thing to do is turn your computer upside down and shake it to reboot it. -- Darren Tucker, asr
data:image/s3,"s3://crabby-images/e88a6/e88a6d57abf46790782357b4e08a5f8aa28e22e4" alt=""
[Jeremy Fincher]
[Tim]
[Michael Hudson]
Oops! So he was -- I spaced out on that.
that any kind of index-or-key-ing is going on is trivia of the implementation, surely?
Sure. I don't care for ValueError in this context, though -- there's nothing wrong with the value I'm testing for set membership, after all. Of course I never cared for ValueError on a failing list.remove() either. I like ValueError best when an input is of the right type but outside the defined domain of a function, like math.sqrt(-1.0) or chr(500). Failing to find something feels more like a (possibly proper subclass of) LookupError to me. But I'd hate to create even more useless distinctions among different kinds of lookup failures, so am vaguely happy reusing the KeyError flavor of LookupError. In any case, I'm not unhappy enough with it to do something about it. I nevertheless agree Jerry raised a good point, and maybe somebody else is unhappy enough with it to change it?
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Yeah, [].remove(42) raising ValueError is a bit weird. It was put in before we had the concept of LookupError, and the rationale for using ValueError was that the *value* is not found -- can't use IndexError because the value is chosen from a different set than the index, can't use KeyError because lists don't have a concept of key. In retrospect, it would have been better to define a SearchError, subclassing LookupError. OTOH there's something to say for fewer errors, not more; e.g. sometimes I wish AttributeError and TypeError were unified, because AttributeError usually means that an object isn't of the expected type. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/f125b/f125b51aa2c9d22fc221b9f35c9f4a5d4eb69924" alt=""
[Guido van Rossum]
I do not much use `os.path.walk' myself. It is so simple to write a small walking loop with a stack of unseen directories, and in practice, there is a wide range of ways and reasons to walk a directory hierarchy, some of which do not fit nicely in the current `os.path.walk' specifications.
That's why I'm concerned about adding to it.
The addition of generators to Python also changed the picture somewhat, in this area. It is often convenient to use a generator for a particular walk. -- François Pinard http://www.iro.umontreal.ca/~pinard
data:image/s3,"s3://crabby-images/f5086/f5086b20c25bc098201ddedc582bb2a5be63fe3d" alt=""
Guido>> This idea has merit, although I'm not sure I'd call this depth first; Guido>> it's more a matter of pre-order vs. post-order, isn't it? I thought the names were synonymous, but a quick look on Google showed that post-order seems more specific to binary trees whereas depth first is more general, but I didn't look very hard and all my college text books are in storage :-) Depth first is more intuitive, but post order is more descriptive of what the algorithm does. If I were writing documentation (or reading it) I would prefer "depth first". Guido>> - How often does one need this? I write these little directory/file filters quite often. I have come across this problem of renaming the directories you are traversing before. In the past the trees were small, so I just renamed the directories by hand and used os.path.walk() to handle the files. Recently I had to rename a very large tree which prompted me to look for a better solution. Guido>> - When needed, how hard is it to hand-code a directory walk? It's not Guido>> like the body of the walk() function is rocket science. True, it is easy to write. It would make a good exercise for a beginner, but I think it's better to have it than to not have it since I think a big part of the appear of Python is the "little" algorithms. It's also fits with the Python Batteries Included philosophy and benefits the "casual" Python user. Finally, I just find it generally useful. I use directory walkers a lot. david>> That's hardly the point of improving the standard library, though, is david>> it? I'm all for putting the kitchen sink in there, especially if it david>> originates with a use case ("I had some dishes to wash..." ;-) Guido> Guido>But if I had to do it over again, I wouldn't have added walk() in the Guido>current form. I often find it harder to fit a particular program's Guido>needs in the API offered by walk() than it is to reimplement the walk Guido>myself. That's why I'm concerned about adding to it. The change is small and the interface is backward compatible, but if you are actually trying to discourage people from using os.path.walk() in the future then I would vote for deprecating it and replacing it with a generator where the default is depthfirst ;-) Below is a sample tree walker using a generator I was delighted to find that they work in recursive functions, but it gave me a headache to think about for the first time. Perhaps it could be prettier, but this demonstrates the basic idea. Yours, Noah # Inspired by Doug Fort from an ActiveState Python recipe. # His version didn't use recursion and didn't do depth first. import os import stat def walktree (top = ".", depthfirst = True): """This walks a directory tree, starting from the 'top' directory. This is somewhat like os.path.walk, but using generators instead of a visit function. One important difference is that walktree() defaults to DEPTH first with optional BREADTH first, whereas the os.path.walk function allows only BREADTH first. Depth first was made the default because it is safer if you are going to be modifying the directory names you visit. This avoids the problem of renaming a directory before visiting the children of that directory. """ names = os.listdir(top) if not depthfirst: yield top, names for name in names: try: st = os.lstat(os.path.join(top, name)) except os.error: continue if stat.S_ISDIR(st.st_mode): for (newtop, children) in walktree (os.path.join(top, name), depthfirst): yield newtop, children if depthfirst: yield top, names def test(): for (basepath, children) in walktree(): for child in children: print os.path.join(basepath, child) if __name__ == '__main__': test()
data:image/s3,"s3://crabby-images/58a0b/58a0be886f0375938476d3eb7345a8b9d8cdc91e" alt=""
Noah Spurrier wrote:
I'm tempted to declare this off-topic: depth-first means "traverse children before traversing siblings". Depth-first comes in three variations: pre-order (traverse node first, then children, then siblings), in-order (only for binary trees: traverse left child first, then node, then right child, then sibling), post-order (traverse children first, then node, then siblings). There is also breadth-first: traverse siblings first, then children.
I write these little directory/file filters quite often. I have come across this problem of renaming the directories you are traversing before.
I still can't understand why you can't use os.path.walk for that. Did you know that you can modify the list that is passed to the callback, and that walk will continue to visit the elements in the list? Regards, Martin
data:image/s3,"s3://crabby-images/e88a6/e88a6d57abf46790782357b4e08a5f8aa28e22e4" alt=""
[Noah Spurrier]
[Martin v. Löwis]
Let's spell it out. Say the directory structure is like so: a/ b/ c/ d/ e/ and we want to stick "x" at the end of each directory name. The first thing the callback sees is arg, "a", ["b", "e"] The callback can rename b and e, and change the contents of the fnames list to ["bx", "ex"] so that walk will find the renamed directories. Etc. This works: """ import os def renamer(arg, dirname, fnames): for i, name in enumerate(fnames): if os.path.isdir(os.path.join(dirname, name)): newname = name + "x" os.rename(os.path.join(dirname, name), os.path.join(dirname, newname)) fnames[i] = newname # crucial! os.path.walk('a', renamer, None) """ It's certainly less bother renaming bottom-up; this works too (given the last walk() generator implementation I posted): """ import os for root, dirs, files in walk('a', topdown=False): for d in dirs: os.rename(os.path.join(root, d), os.path.join(root, d + 'x')) """ A possible surprise is that neither of these renames 'a'.
participants (10)
-
"Martin v. Löwis"
-
David Ascher
-
Guido van Rossum
-
Jeremy Fincher
-
Michael Hudson
-
Neil Schemenauer
-
Noah Spurrier
-
pinard@iro.umontreal.ca
-
Tim Peters
-
Walter Dörwald