Determining when a file has finished copying

Ethan Furman ethan at stoneleaf.us
Mon Jul 14 13:28:18 EDT 2008


Sean DiZazzo wrote:
> On Jul 9, 5:34 pm, keith <ke... at keithperkins.net> wrote:
> 
>>-----BEGIN PGP SIGNED MESSAGE-----
>>Hash: SHA1
>>
>>
>>
>>Ethan Furman wrote:
>>
>>>writeson wrote:
>>>
>>>>Guys,
>>
>>>>Thanks for your replies, they are helpful. I should have included in
>>>>my initial question that I don't have as much control over the program
>>>>that writes (pgm-W) as I'd like. Otherwise, the write to a different
>>>>filename and then rename solution would work great. There's no way to
>>>>tell from the os.stat() methods to tell when the file is finished
>>>>being copied? I ran some test programs, one of which continously
>>>>copies big files from one directory to another, and another that
>>>>continously does a glob.glob("*.pdf") on those files and looks at the
>>>>st_atime and st_mtime parts of the return value of os.stat(filename).
>>>>
>>>>>From that experiment it looks like st_atime and st_mtime equal each
>>>>
>>>>other until the file has finished being copied. Nothing in the
>>>>documentation about st_atime or st_mtime leads me to think this is
>>>>true, it's just my observations about the two test programs I've
>>>>described.
>>
>>>>Any thoughts? Thanks!
>>>>Doug
>>
>>>The solution my team has used is to monitor the file size.  If the file
>>>has stopped growing for x amount of time (we use 45 seconds) the file is
>>>done copying.  Not elegant, but it works.
>>>--
>>>Ethan
>>
>>Also I think that matching the md5sums may work.  Just set up so that it
>>checks the copy's md5sum every couple of seconds (or whatever time
>>interval you want) and matches against the original's.  When they match
>>copying's done. I haven't actually tried this but think it may work.
>>Any more experienced programmers out there let me know if this is
>>unworkable please.
>>K
>>-----BEGIN PGP SIGNATURE-----
>>Version: GnuPG v1.4.6 (GNU/Linux)
>>Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org
>>
>>iD8DBQFIdVkX8vmNfzrLpqoRAsJ2AKCp8wMz93Vz8y9K+MDSP33kH/WHngCgl/wM
>>qTFBfyIEGhu/dNSQzeRrwYQ=
>>=Xvjq
>>-----END PGP SIGNATURE-----
> 
> 
> I use a combination of both the os.stat() on filesize, and md5.
> Checking md5s works, but it can take a long time on big files.  To fix
> that, I wrote a simple  sparse md5 sum generator.  It takes a small
> number bytes from various areas of the file, and creates an md5 by
> combining all the sections. This is, in fact, the only solution I have
> come up with for watching a folder for windows copys.
> 
> The filesize solution doesn't work when a user copies into the watch
> folder using drag and drop on Windows because it allocates all the
> attributes of the file before any data is written.  The filesize will
> always show the full size of the file.
> 
> ~Sean

Good info, Sean, thanks.  One more option may be to attempt to rename 
the file -- if it's still open for copying, that will fail; success 
indicates the copy is done.  Of course, as Larry Bates pointed out, this 
could fail if the copy is followed by a re-open and appending. 
Hopefully that's not an issue for the OP.
--
Ethan



More information about the Python-list mailing list