synching with os.walk()

petercable at gmail.com petercable at gmail.com
Mon Nov 27 04:46:37 EST 2006



On Nov 24, 7:57 am, "Andre Meyer" <m... at acm.org> wrote:
>
> os.walk() is a nice generator for performing actions on all files in a
> directory and subdirectories. However, how can one use os.walk() for walking
> through two hierarchies at once? I want to synchronise two directories (just
> backup for now), but cannot see how I can traverse a second one. I do this
> now with os.listdir() recursively, which works fine, but I am afraid that
> recursion can become inefficient for large hierarchies.
>

I wrote a script to perform this function using the dircmp class in the
filecmp module. I did something similar to this:
import filecmp, os, shutil

def backup(d1,d2):
  print 'backing up %s to %s' % (d1,d2)
  compare = filecmp.dircmp(d1,d2)
  for item in compare.left_only:
    fullpath = os.path.join(d1, item)
    if os.path.isdir(fullpath):
      shutil.copytree(fullpath,os.path.join(d2,item))
    elif os.path.isfile(fullpath):
      shutil.copy2(fullpath,d2)
  for item in compare.diff_files:
    shutil.copy2(os.path.join(d1,item),d2)
  for item in compare.common_dirs:
    backup(os.path.join(d1,item),os.path.join(d2,item))

if __name__ == '__main__':
  import sys
  if len(sys.argv) == 3:
    backup(sys.argv[1], sys.argv[2])

My script has some error checking and keeps up to 5 previous versions
of a changed file. I find it very efficient, even with recursion, as it
only actually copies those files that have changed. I sync somewhere
around 5 GB worth of files nightly across the network and I haven't had
any trouble.

Of course, if I just had rsync available, I would use that.

Hope this helps,

Pete




More information about the Python-list mailing list