Determining when a file has finished copying

Sean DiZazzo half.italian at gmail.com
Fri Jul 11 22:44:17 CEST 2008


On Jul 9, 5:34 pm, keith <ke... at keithperkins.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
>
> Ethan Furman wrote:
> > writeson wrote:
> >> Guys,
>
> >> Thanks for your replies, they are helpful. I should have included in
> >> my initial question that I don't have as much control over the program
> >> that writes (pgm-W) as I'd like. Otherwise, the write to a different
> >> filename and then rename solution would work great. There's no way to
> >> tell from the os.stat() methods to tell when the file is finished
> >> being copied? I ran some test programs, one of which continously
> >> copies big files from one directory to another, and another that
> >> continously does a glob.glob("*.pdf") on those files and looks at the
> >> st_atime and st_mtime parts of the return value of os.stat(filename).
> >>> From that experiment it looks like st_atime and st_mtime equal each
> >> other until the file has finished being copied. Nothing in the
> >> documentation about st_atime or st_mtime leads me to think this is
> >> true, it's just my observations about the two test programs I've
> >> described.
>
> >> Any thoughts? Thanks!
> >> Doug
>
> > The solution my team has used is to monitor the file size.  If the file
> > has stopped growing for x amount of time (we use 45 seconds) the file is
> > done copying.  Not elegant, but it works.
> > --
> > Ethan
>
> Also I think that matching the md5sums may work.  Just set up so that it
> checks the copy's md5sum every couple of seconds (or whatever time
> interval you want) and matches against the original's.  When they match
> copying's done. I haven't actually tried this but think it may work.
> Any more experienced programmers out there let me know if this is
> unworkable please.
> K
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org
>
> iD8DBQFIdVkX8vmNfzrLpqoRAsJ2AKCp8wMz93Vz8y9K+MDSP33kH/WHngCgl/wM
> qTFBfyIEGhu/dNSQzeRrwYQ=
> =Xvjq
> -----END PGP SIGNATURE-----

I use a combination of both the os.stat() on filesize, and md5.
Checking md5s works, but it can take a long time on big files.  To fix
that, I wrote a simple  sparse md5 sum generator.  It takes a small
number bytes from various areas of the file, and creates an md5 by
combining all the sections. This is, in fact, the only solution I have
come up with for watching a folder for windows copys.

The filesize solution doesn't work when a user copies into the watch
folder using drag and drop on Windows because it allocates all the
attributes of the file before any data is written.  The filesize will
always show the full size of the file.

~Sean



More information about the Python-list mailing list