os.walk restart

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Wed Mar 17 19:09:28 EDT 2010


En Wed, 17 Mar 2010 19:04:14 -0300, Keir Vaughan-taylor <keirvt at gmail.com>  
escribió:

> I am traversing a large set of directories using
>
> for root, dirs, files in os.walk(basedir):
>     run program
>
> Being a huge directory set the traversal is taking days to do a
> traversal.
> Sometimes it is the case there is a crash because of a programming
> error.
> As each directory is processed the name of the directory is written to
> a file
> I want to be able to restart the walk from the directory where it
> crashed.
>
> Is this possible?

If the 'dirs' list were guaranteed to be sorted, you could remove at each  
level all previous directories already traversed. But it's not :(

Perhaps a better approach would be, once, collect all directories to be  
processed and write them on a text file -- these are the pending  
directories. Then, read from the pending file and process every directory  
in it. If the process aborts for any reason, manually delete the lines  
already processed and restart.

If you use a database instead of a text file, and mark entries as "done"  
after processing, you can avoid that last manual step and the whole  
process may be kept running automatically. In some cases you may want to  
choose the starting point at random.

-- 
Gabriel Genellina




More information about the Python-list mailing list