synching with os.walk()
petercable at gmail.com
petercable at gmail.com
Mon Nov 27 04:46:37 EST 2006
On Nov 24, 7:57 am, "Andre Meyer" <m... at acm.org> wrote:
>
> os.walk() is a nice generator for performing actions on all files in a
> directory and subdirectories. However, how can one use os.walk() for walking
> through two hierarchies at once? I want to synchronise two directories (just
> backup for now), but cannot see how I can traverse a second one. I do this
> now with os.listdir() recursively, which works fine, but I am afraid that
> recursion can become inefficient for large hierarchies.
>
I wrote a script to perform this function using the dircmp class in the
filecmp module. I did something similar to this:
import filecmp, os, shutil
def backup(d1,d2):
print 'backing up %s to %s' % (d1,d2)
compare = filecmp.dircmp(d1,d2)
for item in compare.left_only:
fullpath = os.path.join(d1, item)
if os.path.isdir(fullpath):
shutil.copytree(fullpath,os.path.join(d2,item))
elif os.path.isfile(fullpath):
shutil.copy2(fullpath,d2)
for item in compare.diff_files:
shutil.copy2(os.path.join(d1,item),d2)
for item in compare.common_dirs:
backup(os.path.join(d1,item),os.path.join(d2,item))
if __name__ == '__main__':
import sys
if len(sys.argv) == 3:
backup(sys.argv[1], sys.argv[2])
My script has some error checking and keeps up to 5 previous versions
of a changed file. I find it very efficient, even with recursion, as it
only actually copies those files that have changed. I sync somewhere
around 5 GB worth of files nightly across the network and I haven't had
any trouble.
Of course, if I just had rsync available, I would use that.
Hope this helps,
Pete
More information about the Python-list
mailing list