wiggly at wiggly.org
Sun Mar 1 19:11:41 CET 2009
Excuse me if I'm a little blunt below. I'm ill grumpy...
> hi nigel...
> using any kind of file locking process requires that i essentially have a
> gatekeeper, allowing a single process to enter, access the files at a
I don't beleive this is a necessary condition. That would only be the
case if you allowed yourself a single lock.
> i can easily setup a file read/write lock process where a client app
> gets/locks a file, and then copies/moves the required files from the initial
> dir to a tmp dir. after the move/copy, the lock is released, and the client
> can go ahead and do whatever with the files in the tmp dir.. thie process
> allows multiple clients to operate in a psuedo parallel manner...
> i'm trying to figure out if there's a much better/faster approach that might
> be available.. which is where the academic/research issue was raised..
I'm really not sure why you want to move the files around. Here are two
different approaches from the one I initially gave you that deals
perfectly well with a directory where files are constantly being added.
In both approaches we are going to try and avoid using OS-specific
locking mechanisms, advisory locking, flock etc. So it should work
everywhere as long as you also have write access to the filesystem
Approach 1 - Constant Number of Processes
This requires no central manager but for every file lock requires a few
Start up N processes with the same working directory WORK_DIR.
Each process then follows this algorithm:
- sleep for some small random period.
- scan the WORK_DIR for a FILE that does not have a corresponding LOCK_FILE
- open LOCK_FILE in append mode and write our PID into it.
- close LOCK_FILE
- open LOCK_FILE
- read first line from LOCK_FILE and compare to our PID
- if the PID we just read from the LOCK_FILE matches ours then we may
process the corresponding FILE otherwise another process beat us to it.
After processing a file completely you can remove it and the lockfile at
the same time.
As long as filenames follow some pattern then you can simply say that
the LOCK_FILE for FILE is called FILE.lock
WORK_DIR : /home/wiggly/var/work
FILE : /home/wiggly/var/work/data_2354272.dat
LOCK_FILE : /home/wiggly/var/work/data_2354272.dat.lock
Approach 2 - Managed Processes
Here we have a single main process that spawns children. The children
listen for filenames on a pipe that the parent has open to them.
The parent constantly scans the WORK_DIR for new files to process and as
it finds one it sends that filename to a child process.
You can either be clever about the children and ensure they tell the
parent when they're free or just pass them work in a round-robin fashion.
I hope the two above descriptions make sense, let me know if they don't.
> the issue that i'm looking at is analogous to a FIFO, where i have lots of
> files being shoved in a dir from different processes.. on the other end, i
> want to allow mutiple client processes to access unique groups of these
> files as fast as possible.. access being fetch/gather/process/delete the
> files. each file is only handled by a single client process.
> -----Original Message-----
> From: python-list-bounces+bedouglas=earthlink.net at python.org
> [mailto:python-list-bounces+bedouglas=earthlink.net at python.org]On Behalf
> Of Nigel Rantor
> Sent: Sunday, March 01, 2009 2:00 AM
> To: koranthala
> Cc: python-list at python.org
> Subject: Re: file locking...
> koranthala wrote:
>> On Mar 1, 2:28 pm, Nigel Rantor <wig... at wiggly.org> wrote:
>>> bruce wrote:
>>>> Got a bit of a question/issue that I'm trying to resolve. I'm asking
>>>> this of a few groups so bear with me.
>>>> I'm considering a situation where I have multiple processes running,
>>>> and each process is going to access a number of files in a dir. Each
>>>> process accesses a unique group of files, and then writes the group
>>>> of files to another dir. I can easily handle this by using a form of
>>>> locking, where I have the processes lock/read a file and only access
>>>> the group of files in the dir based on the open/free status of the
>>>> However, the issue with the approach is that it's somewhat
>>>> synchronous. I'm looking for something that might be more
>>>> asynchronous/parallel, in that I'd like to have multiple processes
>>>> each access a unique group of files from the given dir as fast as
>>> I don't see how this is synchronous if you have a lock per file. Perhaps
>>> you've missed something out of your description of your problem.
>>>> So.. Any thoughts/pointers/comments would be greatly appreciated. Any
>>>> pointers to academic research, etc.. would be useful.
>>> I'm not sure you need academic papers here.
>>> One trivial solution to this problem is to have a single process
>>> determine the complete set of files that require processing then fork
>>> off children, each with a different set of files to process.
>>> The parent then just waits for them to finish and does any
>>> post-processing required.
>>> A more concrete problem statement may of course change the solution...
>> Using twisted might also be helpful.
>> Then you can avoid the problems associated with threading too.
> No one mentioned threads.
> I can't see how Twisted in this instance isn't like using a sledgehammer
> to crack a nut.
More information about the Python-list