Scanning directories for new files?

Martin Gregorie martin at address-in-sig.invalid
Tue Dec 21 14:51:25 EST 2010


On Tue, 21 Dec 2010 14:17:40 -0500, Matty Sarro wrote:

> Hey everyone.
> I'm in the midst of writing a parser to clean up incoming files, remove
> extra data that isn't needed, normalize some values, etc. The base files
> will be uploaded via FTP.
> How does one go about scanning a directory for new files? For now we're
> looking to run it as a cron job but eventually would like to move away
> from that into making it a service running in the background.
>
Make sure the files are initially uploaded using a name that the parser 
isn't looking for and rename it when the upload is finished. This way the 
parser won't try to process a partially loaded file. 

If you are uploading to a *nix machine You the rename can move the file 
between directories provided both directories are in the same filing 
system. Under those conditions rename is always an atomic operation with 
no copying involved. This would you to, say, upload the file to "temp/
myfile" and renamed it to "uploaded/myfile" with your parser only 
scanning the uploaded directory and, presumably, renaming processed files 
to move them to a third directory ready for further processing.

I've used this technique reliably with files arriving via FTP at quite 
high rates.
  

-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |



More information about the Python-list mailing list