postprocessing in os.walk

Dave Angel davea at ieee.org
Tue Oct 13 12:31:25 EDT 2009


Peter Otten wrote:
> kj wrote:
>
>   
>> In <mailman.1196.1255347115.2807.python-list at python.org> Dave Angel
>> <davea at ieee.org> writes:
>>
>>     
>>> kj wrote:
>>>       
>>>> Perl's directory tree traversal facility is provided by the function
>>>> find of the File::Find module.  This function accepts an optional
>>>> callback, called postprocess, that gets invoked "just before leaving
>>>> the currently processed directory."  The documentation goes on to
>>>> say "This hook is handy for summarizing a directory, such as
>>>> calculating its disk usage", which is exactly what I use it for in
>>>> a maintenance script.
>>>>
>>>> This maintenance script is getting long in the tooth, and I've been
>>>> meaning to add a few enhancements to it for a while, so I thought
>>>> that in the process I'd port it to Python, using the os.walk
>>>> function, but I see that os.walk does not have anything like this
>>>> File::Find::find's postprocess hook.  Is there a good way to simulate
>>>> it (without having to roll my own File::Find::find in Python)?
>>>>
>>>> TIA!
>>>>
>>>> kynn
>>>>
>>>>   
>>>>         
>>> Why would you need a special hook when the os.walk() generator yields
>>> exactly once per directory?  So whatever work you do on the list of
>>> files you get, you can then put the summary logic immediately after.
>>>       
>>> Or if you really feel you need a special hook, then write a wrapper for
>>> os.walk(), which takes a hook function as a parameter, and after
>>> yielding each file in a directory, calls the hook.  Looks like about 5
>>> lines.
>>>       
>> I think you're missing the point.  The hook in question has to be
>> called *immediately after* all the subtrees that are rooted in
>> subdirectories contained in the current directory have been visited
>> by os.walk.
>>
>> I'd love to see your "5 lines" for *that*.
>>     
>
> import os
>
> def find(root, process):
>     for pdf in os.walk(root, topdown=False):
>             process(*pdf)
>
> def process(path, dirs, files):
>     print path
>
> find(".", process)
>
> Peter
>
>
>
>   
Thanks Peter,

To expand it to five lines, and make it the generator I mentioned,

import os

def find(root, process):
    for pdf in os.walk(root, topdown=False):
            for file in pdf[2]:
                yield os.path.join(pdf[0],file)
            process(*pdf)

def process(path, dirs, files):
    print "hooked --", path

for fullpath in find("..", process):
    print fullpath


This is a generator which yields each file in a directory tree, and 
after all the files below a particular directory are processed, 
"immediately" calls the hook

DaveA



More information about the Python-list mailing list