os.walk/list
Peter Otten
__peter__ at web.de
Sun Mar 20 04:28:38 EDT 2011
ecu_jon wrote:
> so i am trying to add md5 checksum calc to my file copy stuff, to make
> sure the source and dest. are same file.
> i implemented it fine with the single file copy part. something like :
> for files in sourcepath:
> f1=file(files ,'rb')
> try:
> shutil.copy2(files,
> os.path.join(destpath,os.path.basename(files)))
> except:
> print "error file"
> f2=file(os.path.join(destpath,os.path.basename(files)), 'rb')
> truth = md5.new(f1.read()).digest() ==
> md5.new(f2.read()).digest()
> if truth == 0:
> print "file copy error"
>
> this worked swimmingly. i moved on to my backupall function, something
> like
> for (path, dirs, files) in os.walk(source):
> #os.walk drills down thru all the folders of source
> for fname in dirs:
> currentdir = destination+leftover
> try:
> os.mkdir(os.path.join(currentdir,fname),0755)
> except:
> print "error folder"
> for fname in files:
> leftover = path.replace(source, '')
> currentdir = destination+leftover
> f1=file(files ,'rb')
> try:
> shutil.copy2(os.path.join(path,fname),
> os.path.join(currentdir,fname))
> f2 = file(os.path.join(currentdir,fname,files))
> except:
> print "error file"
> truth = md5.new(f1.read()).digest() ==
> md5.new(f2.read()).digest()
> if truth == 0:
> print "file copy error"
>
> but here, "fname" is a list, not a single file.i didn't really want to
> spend a lot of time on the md5 part. thought it would be an easy add-
> on. i don't really want to write the file names out to a list and
> parse through them one a time doing the calc, but it sounds like i
> will have to do something like that.
If you have something working for one file, don't copy the code into the
os.walk() for-loop, put it into a function, say:
def safe_copy(sourcefile, destfolder):
# your code
Then call that thoroughly tested function from within the os.walk() loop
for path, folders, files in os.walk(sourceroot):
destfolder = ... # os.path.relpath() might help here
# ... (make subdirectories)
for name in files:
sourcefile = os.path.join(path, name)
safe_copy(sourcefile, destfolder)
If you find a bug in safe_copy() you'll only have to fix it in one place.
Also, you can test it with a single file which should be easier and faster
than processing a whole directory tree.
Generally speaking breaking code into small functions that can be tested
individually is a powerful technique. And you don't have to stop here, you
can break safe_copy() into
def safe_copy(sourcefile, destfolder):
destfile = ...
copyfile(sourcefile, destfile)
if not equal_content(sourcefile, destfile):
# print a warning or raise an exception
Sometimes you'll even find that the smaller more specialized routines
already exist in the standard library.
More information about the Python-list
mailing list