dirwalk.py generator version of os.path.walk

Jim Dennis jimd at vega.starshine.org
Thu Feb 28 23:41:11 CET 2002


In article <just-BED145.11365928022002 at news1.xs4all.nl>, Just van Rossum wrote:
>In article <a5kv49$ae8$2 at news.idiom.com>,
> jimd at vega.starshine.org (Jim Dennis) wrote:

>>> I wrote a different implementation of this general concept at:

>>> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/105873

>>> You don't really need to keep a stack of directories and push/pop
>>> things, because with generators you can recurse instead.
>>> Tom
 
  BTW: Tom, I did come across your version in the ASPN a day or so
  after I wrote mine.  I agree it looks cleaner and simpler.
  I might still use a list (FIFO or stack) and while loop instead
  of the recursion. (see below).

>>  But recursion is likely to cost more.  The only state I need
>>  to keep is my current "todo" list of directories.  A recursion 
>>  would store functional state (unless Python supported tail-end
>>  recursion).  So the append/pop (total cost, 3 lines of code)
>>  seems like the lightest weight way to do this.

> Why don't you try it out? Setting up a generator is as expensive as 
> calling a function, but resuming or yielding a generator is _very_ cheap.

> I personally find the recursive version clearer.
> Just

 I guess I'm just wary of recursion, particularly when I've read
 that Python doesn't support tail-end recursion.  

 However, I agree that a good test is worth infinitely more than
 any amount of theoretical speculation.

 I've read that there's some sort of profiling module for Python;
 how would I profile each of these approaches in terms of footprint
 CPU utilization and wall-clock time?  
 
 As I say, I'm still concerned that the generator will have lots 
 of stack frame overhead in recursion while the interative approach 
 just incurs the size of the list.

 I'm considering playing with my dirwalk() a bit more (possibly 
 adding additional options to it, or implementing it as a class and
 writing a set of Decorators to add some of the features that I 
 mentioned in my OP.  I suspect I need to design it as a hybrid; 
 options that affect the depth and traversal (i.e. not crossing
 mount points, setting max depth, etc) should probably be added
 in, things that affect the yield()'d results (anything returned
 by stat(), filename or regex patterns, contents scans, etc) should
 probably be done in Decorators).
 
 I'm just trying to learn how to be a "real programmer" so this is
 mostly for personal edification.  However, it would be nice if I
 created a useful Python module which provided all of the power
 for Python that the "find" command provides to the shell.  (Of course
 we already *can* do most of that with os.path.walk, and we could 
 use popen() on a find command, or we could use either of these
 dirwalk() generators and do our own stats, etc.  However, I'd like
 to write something that makes it a bit easier and more concise.  
 




More information about the Python-list mailing list