[Guido]
But if I had to do it over again, I wouldn't have added walk() in the current form.
[Neil Schemenauer]
I think it's the perfect place for a generator.
[Guido]
Absolutely! So let's try to write something new based on generators, make it flexible enough so that it can handle pre-order or post-order visits, and then phase out os.walk().
I posted one last night, with a bug (it failed to pass the topdown flag through to recursive calls). Here's that again, with the bug repaired, sped up some, and with a docstring. Double duty: the example in the docstring shows why we don't want to make a special case out of sum([]): empty lists can arise naturally. What else would people like in this? I really like separating the directory names from the plain-file names, so don't bother griping about that <wink>. It's at least as fast as the current os.path.walk() (it's generally faster for me, but times for this are extremely variable on Win98). Removing the internal recursion doesn't appear to make a measureable difference when walking my Python tree, although because recursive generators require time proportional to the current stack depth to deliver a result to the caller, and to resume again, removing recursion could be much more efficient on an extremely deep tree. The biggest speedup I could find on Windows was via using os.chdir() liberally, so that os.path.join() calls weren't needed, and os.path.isdir() calls worked directly on one-component names. I suspect this has to do with that Win98 doesn't have an effective way to cache directory lookups under the covers. Even so, it only amounted to a 10% speedup: directory walking is plain slow on Win98 no matter how you do it. The attached doesn't play any gross speed tricks.