Determining when a file has finished copying
Ethan Furman
ethan at stoneleaf.us
Mon Jul 14 13:28:18 EDT 2008
Sean DiZazzo wrote:
> On Jul 9, 5:34 pm, keith <ke... at keithperkins.net> wrote:
>
>>-----BEGIN PGP SIGNED MESSAGE-----
>>Hash: SHA1
>>
>>
>>
>>Ethan Furman wrote:
>>
>>>writeson wrote:
>>>
>>>>Guys,
>>
>>>>Thanks for your replies, they are helpful. I should have included in
>>>>my initial question that I don't have as much control over the program
>>>>that writes (pgm-W) as I'd like. Otherwise, the write to a different
>>>>filename and then rename solution would work great. There's no way to
>>>>tell from the os.stat() methods to tell when the file is finished
>>>>being copied? I ran some test programs, one of which continously
>>>>copies big files from one directory to another, and another that
>>>>continously does a glob.glob("*.pdf") on those files and looks at the
>>>>st_atime and st_mtime parts of the return value of os.stat(filename).
>>>>
>>>>>From that experiment it looks like st_atime and st_mtime equal each
>>>>
>>>>other until the file has finished being copied. Nothing in the
>>>>documentation about st_atime or st_mtime leads me to think this is
>>>>true, it's just my observations about the two test programs I've
>>>>described.
>>
>>>>Any thoughts? Thanks!
>>>>Doug
>>
>>>The solution my team has used is to monitor the file size. If the file
>>>has stopped growing for x amount of time (we use 45 seconds) the file is
>>>done copying. Not elegant, but it works.
>>>--
>>>Ethan
>>
>>Also I think that matching the md5sums may work. Just set up so that it
>>checks the copy's md5sum every couple of seconds (or whatever time
>>interval you want) and matches against the original's. When they match
>>copying's done. I haven't actually tried this but think it may work.
>>Any more experienced programmers out there let me know if this is
>>unworkable please.
>>K
>>-----BEGIN PGP SIGNATURE-----
>>Version: GnuPG v1.4.6 (GNU/Linux)
>>Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org
>>
>>iD8DBQFIdVkX8vmNfzrLpqoRAsJ2AKCp8wMz93Vz8y9K+MDSP33kH/WHngCgl/wM
>>qTFBfyIEGhu/dNSQzeRrwYQ=
>>=Xvjq
>>-----END PGP SIGNATURE-----
>
>
> I use a combination of both the os.stat() on filesize, and md5.
> Checking md5s works, but it can take a long time on big files. To fix
> that, I wrote a simple sparse md5 sum generator. It takes a small
> number bytes from various areas of the file, and creates an md5 by
> combining all the sections. This is, in fact, the only solution I have
> come up with for watching a folder for windows copys.
>
> The filesize solution doesn't work when a user copies into the watch
> folder using drag and drop on Windows because it allocates all the
> attributes of the file before any data is written. The filesize will
> always show the full size of the file.
>
> ~Sean
Good info, Sean, thanks. One more option may be to attempt to rename
the file -- if it's still open for copying, that will fail; success
indicates the copy is done. Of course, as Larry Bates pointed out, this
could fail if the copy is followed by a re-open and appending.
Hopefully that's not an issue for the OP.
--
Ethan
More information about the Python-list
mailing list