dirwalk.py generator version of os.path.walk
jimd at vega.starshine.org
Thu Feb 28 23:41:11 CET 2002
In article <just-BED145.11365928022002 at news1.xs4all.nl>, Just van Rossum wrote:
>In article <a5kv49$ae8$2 at news.idiom.com>,
> jimd at vega.starshine.org (Jim Dennis) wrote:
>>> I wrote a different implementation of this general concept at:
>>> You don't really need to keep a stack of directories and push/pop
>>> things, because with generators you can recurse instead.
BTW: Tom, I did come across your version in the ASPN a day or so
after I wrote mine. I agree it looks cleaner and simpler.
I might still use a list (FIFO or stack) and while loop instead
of the recursion. (see below).
>> But recursion is likely to cost more. The only state I need
>> to keep is my current "todo" list of directories. A recursion
>> would store functional state (unless Python supported tail-end
>> recursion). So the append/pop (total cost, 3 lines of code)
>> seems like the lightest weight way to do this.
> Why don't you try it out? Setting up a generator is as expensive as
> calling a function, but resuming or yielding a generator is _very_ cheap.
> I personally find the recursive version clearer.
I guess I'm just wary of recursion, particularly when I've read
that Python doesn't support tail-end recursion.
However, I agree that a good test is worth infinitely more than
any amount of theoretical speculation.
I've read that there's some sort of profiling module for Python;
how would I profile each of these approaches in terms of footprint
CPU utilization and wall-clock time?
As I say, I'm still concerned that the generator will have lots
of stack frame overhead in recursion while the interative approach
just incurs the size of the list.
I'm considering playing with my dirwalk() a bit more (possibly
adding additional options to it, or implementing it as a class and
writing a set of Decorators to add some of the features that I
mentioned in my OP. I suspect I need to design it as a hybrid;
options that affect the depth and traversal (i.e. not crossing
mount points, setting max depth, etc) should probably be added
in, things that affect the yield()'d results (anything returned
by stat(), filename or regex patterns, contents scans, etc) should
probably be done in Decorators).
I'm just trying to learn how to be a "real programmer" so this is
mostly for personal edification. However, it would be nice if I
created a useful Python module which provided all of the power
for Python that the "find" command provides to the shell. (Of course
we already *can* do most of that with os.path.walk, and we could
use popen() on a find command, or we could use either of these
dirwalk() generators and do our own stats, etc. However, I'd like
to write something that makes it a bit easier and more concise.
More information about the Python-list