os.walk restart
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Wed Mar 17 19:09:28 EDT 2010
En Wed, 17 Mar 2010 19:04:14 -0300, Keir Vaughan-taylor <keirvt at gmail.com>
escribió:
> I am traversing a large set of directories using
>
> for root, dirs, files in os.walk(basedir):
> run program
>
> Being a huge directory set the traversal is taking days to do a
> traversal.
> Sometimes it is the case there is a crash because of a programming
> error.
> As each directory is processed the name of the directory is written to
> a file
> I want to be able to restart the walk from the directory where it
> crashed.
>
> Is this possible?
If the 'dirs' list were guaranteed to be sorted, you could remove at each
level all previous directories already traversed. But it's not :(
Perhaps a better approach would be, once, collect all directories to be
processed and write them on a text file -- these are the pending
directories. Then, read from the pending file and process every directory
in it. If the process aborts for any reason, manually delete the lines
already processed and restart.
If you use a database instead of a text file, and mark entries as "done"
after processing, you can avoid that last manual step and the whole
process may be kept running automatically. In some cases you may want to
choose the starting point at random.
--
Gabriel Genellina
More information about the Python-list
mailing list