A new and very robust method for doing file locking over NFS?

Charlie Reiman rando-15696825.5.reiman at xoxy.net
Fri Apr 18 18:27:08 CEST 2003


Douglas Alan <nessus at mit.edu> writes:

> I'd like to do file locking over NFS without using lockd.  The reason
> I want to avoid using lockd is because many lockd implementations are
> too buggy.
> 
> It is fairly easy to avoid using lockd -- just avoid using lockf() to
> lock a file.  Instead of using lockf(), lock a file by creating a lock
> file that you open with the O_CREAT | O_EXCL flags.  To unlock the
> file, you merely unlink the lock file.  This method is fairly
> reliable, except that there is a small chance with every file lock or
> unlock that something will go wrong.
> 
> The two failure symptoms, as I understand things, are as follows:
> 
> (1) The lock file might be created without the client realizing that
> it has been created if the file creation acknowledgement is lost due
> to severe network problems.  The file being locked would then remain
> locked forever (until someone manually deletes the lock) because no
> process would take responsibility for having locked the file.  This
> failure symptom is relatively benign for my purposes and if needed it
> can be fixed via the approach described in the Red Hat man page for
> the open() system call.
> 
> (2) When a process goes to remove its file lock, the acknowledgement
> for the unlink() could be lost.  If this happens, then the NFS driver
> on the client could accidentally unlink a lock file created by another
> process when it retries the unlink() request.  This failure symptom is
> pretty bad for my purposes, since it could cause a structured file to
> become corrupt.
> 
> I have an idea for a slightly different way of doing file locking that
> I think solves problem #2 (and also solves problem #1).  What if,
> instead of using a lock file to lock a file, we rename the file to
> something like "filename.locked.hostname.pid"?  If the rename()
> acknowledgement gets lost, the client will see the rename() system
> call as having failed due to the file not existing.  But in this case
> it can then check for the existence of "filename.locked.hostname.pid".
> If this file exists, then the process knows that the rename() system
> call didn't actually fail--the acknowledgement just got lost.  Later,
> when the process goes to unlock the file, it will rename the file back
> to "filename".  Again, if the rename system call appears to fail, the
> process can check for the existance of "filename.locked.hostname.pid".
> If the file no longer exists, then it knows the rename call really did
> succeed, and again the acknowledgement just got lost.
> 
> How does this sound?  Is this close to foolproof, or am missing
> something?
> 
> I'm not much of an NFS expert, so I am a bit worried that there are
> details of NFS client-side caching that I don't understand that would
> prevent this scheme from working without some modification.
> 
> |>oug

I like this idea a lot but it has one drawback. In the traditional
lockfile approach, you can have multiple readers even if someone is
writing. Now that you rename the file, eveytime someone is writing the
filename is a moving target. Depending on how you intend to use the
file, this might make your solution a no-go from the get-go.

But as long as your locking behavior applies to both reading and
writing, it seems quite foolproof.

FWIW, I've never had any problems with NFS for traditional lock
files. NFS works pretty hard to get messages where they need to
be. If I were that deeply concerned or running across an unreliable
network, I'd probably write my own lock daemon and run it over TCP.

Charlie.






More information about the Python-list mailing list