Does Python need a file locking module (slightly higher level)?

I'm always daunted by the prospect of trying to implement file locking. This just came up again in SpamBayes where we have never protected our pickle files from corruption when multiple processes access them simultaneously. The presence of networked file systems and platform-independent locks make it a nasty little problem. Maybe I'm just showing my age. Does fcntl.flock work over NFS and SMB and on Windows? If this is still as much of a mess as I remember, should Python provide a simple file locking module in the standard distribution? Side note: While reading the fcntl man page on my Mac I came across this text in the description of F_GETLK/F_SETLK/F_SETLKW. This interface follows the completely stupid semantics of System V and IEEE Std 1003.1-1988 (``POSIX.1'') that require that all locks associated with a file for a given process are removed when any file descriptor for that file is closed by that process.... Flock(2) is recommended for applications that want to ensure the integrity of their locks when using library routines or wish to pass locks to their children. I guess the BSD folks were a bit upset when they wrote that. Skip

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Oct 22, 2007, at 8:15 AM, skip@pobox.com wrote:
I'm always daunted by the prospect of trying to implement file locking. This just came up again in SpamBayes where we have never protected our pickle files from corruption when multiple processes access them simultaneously. The presence of networked file systems and platform-independent locks make it a nasty little problem. Maybe I'm just showing my age. Does fcntl.flock work over NFS and SMB and on Windows? If this is still as much of a mess as I remember, should Python provide a simple file locking module in the standard distribution?
If you want something like this, you might start by looking at Mailman's LockFile.py. It has a particular set of semantics (such as lock breaking) that you might not be interested in, but it is, or can be, mostly de-Mailmanized for use as a general library module. In the particular use case it is designed for, it's been quite stable for many years. Essentially it provides an NFS-safe lock file implementation. - -Barry http://codebrowse.launchpad.net/~mailman-coders/mailman/3.0/annotate/ barry%40python.org-20071011032203-w1j8qrmtlpkrvay4? file_id=mailmanlockfile.py-20070507165525-0o0kligrooe34vyc-172 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRxyngHEjvBPtnXfVAQKPOwP+JhuIC2LiOsHDPLtAft4bSMaYC1qfVJqG q6SXFc8yJauE9zKttPcn9kkbgONj3RYbDJ9qW4aVA7fJfHEiRDbW8omp/e7rTELl fIonBDnIk5XEo5bL/JslMudgInOa6BY7yGzCKjaRRy19wSmOZ8ptroXfOvLgqF+e n7WVkh82sD8= =aFzw -----END PGP SIGNATURE-----

Barry Warsaw wrote:
On Oct 22, 2007, at 8:15 AM, skip@pobox.com wrote:
I'm always daunted by the prospect of trying to implement file locking.
If you want something like this, you might start by looking at Mailman's LockFile.py.
Also related is the very simple zc.lockfile: http://pypi.python.org/pypi/zc.lockfile -- Benji York http://benjiyork.com

>> I'm always daunted by the prospect of trying to implement file >> locking. Barry> If you want something like this, you might start by looking at Barry> Mailman's LockFile.py. If I interpret the Python documentation for os.link correctly, the scheme used by Mailman won't work on Windows (os.link isn't advertised as being available there). Nevertheless, the pointer to the Linux open(2) man page was a good start. Implementing something for Unix-y systems is not too difficult using that advice. Jean-Paul Calderone sent me a pointer to Twisted's file locking module. I'm still trying to figure out exactly what it does on Windows. Something about making and populating directories. (os.mkdir is the replacement for os.link?) Benji York referred me to zc.lockfile. That appears to use fcntl.flock. Based on Jean-Paul's response it seems the jury is still out on whether fcntl.flock works over NFS. zc.lockfile at least has something specifically for Windows. Whether or not msvcrt.locking() works on networked file systems remains to be seen. Mailman's lockfile makes provision to block for a user-specified period of time. The other's push the waiting back onto the calling code. It's not clear that any of these implementations is going to be perfect. Maybe none ever will be. In his reply Jean-Paul made this comment: It might be nice to have something like that in the standard library, but it's very simple once you know what to do. I'm not so sure about the "very simple" part, especially if you aren't familiar with all the ins and outs of the different platforms. The fact that the first three bits of code I was referred to were implemented by three significant Python tools/platforms and that all are different in some significant ways suggests that there is some both an underlying need for a file locking mechanism but with a lack of consensus about the best way to implement the mother-of-all-file-locking schemes for Python. Maybe the best place for this is in the distribution. PEP? Skip

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Oct 22, 2007, at 11:30 PM, skip@pobox.com wrote:
It's not clear that any of these implementations is going to be perfect. Maybe none ever will be.
I would agree with this. You write a program and know you need to implement some kind of resource locking, so you start looking for some OTS solution. But then you realize that your application needs somewhat different semantics or needs to work in platforms or environments that the OTS code doesn't handle. Just a few days ago, I was looking at some locking code that needed to work across multiple invocations of a script on multiple machines, and the only thing they shared was a PostgreSQL connection, so we ended up wanting to use its advisory locks.
In his reply Jean-Paul made this comment:
It might be nice to have something like that in the standard library, but it's very simple once you know what to do.
I'm not so sure about the "very simple" part, especially if you aren't familiar with all the ins and outs of the different platforms.
I'd totally agree with this. Locking seems simple, but it's got some really tricky aspects that need to be coded just right or you'll be in a world of hurt. Mailman's LockFile.py (which you're right is *nix only) is stable now, but has had some really subtle bugs in the past.
The fact that the first three bits of code I was referred to were implemented by three significant Python tools/platforms and that all are different in some significant ways suggests that there is some both an underlying need for a file locking mechanism but with a lack of consensus about the best way to implement the mother-of-all-file-locking schemes for Python. Maybe the best place for this is in the distribution. PEP?
I don't think any one solution will work for everybody. I'm not even sure we can define a common API a la the DBAPI, but if something were to make it into the standard distribution, that's the direction I'd go in. Then we can provide various implementations that support the LockingAPI under various environments, constraints, and platforms. If we wanted to distribute them in the stdlib, we could put them all in a package and let the user decide which features they need. I'm still planning on de-Mailman-ifying LockFile.py sometime soon. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRyFh4XEjvBPtnXfVAQIAgwQAk0Hf8df6zVGE0sMEfDGFqw6U5/w4TN07 Wiw4Gxq5mRh7jUGoscMrs7L0mjppC/yrv0xIey0u3uQAZqGKLvK2LRBSdC6vyaGY v9ExnI+q59ffe3oL6UTAmuiouTZspYxSv88wl6ATIPpK0SveAzlwu1c2Xnmw1MaR 5m0Mp+VUR9Q= =6FrA -----END PGP SIGNATURE-----

On 2007-10-26 05:41, Barry Warsaw wrote:
On Oct 22, 2007, at 11:30 PM, skip@pobox.com wrote:
It's not clear that any of these implementations is going to be perfect. Maybe none ever will be.
I would agree with this. You write a program and know you need to implement some kind of resource locking, so you start looking for some OTS solution. But then you realize that your application needs somewhat different semantics or needs to work in platforms or environments that the OTS code doesn't handle. Just a few days ago, I was looking at some locking code that needed to work across multiple invocations of a script on multiple machines, and the only thing they shared was a PostgreSQL connection, so we ended up wanting to use its advisory locks.
In his reply Jean-Paul made this comment:
It might be nice to have something like that in the standard library, but it's very simple once you know what to do.
I'm not so sure about the "very simple" part, especially if you aren't familiar with all the ins and outs of the different platforms.
I'd totally agree with this. Locking seems simple, but it's got some really tricky aspects that need to be coded just right or you'll be in a world of hurt. Mailman's LockFile.py (which you're right is *nix only) is stable now, but has had some really subtle bugs in the past.
You might want to take a look at the FileLock.py module that's part of the eGenix mx Base distribution (mx.Misc.FileLock). It works reliably on Unix and Windows, doesn't rely on fcntl and has been in use for years. The only downside is that it's application specific, ie. only applications using the module for locking will detect the locks - but then again: this is exactly the problem you typically want to solve.
The fact that the first three bits of code I was referred to were implemented by three significant Python tools/platforms and that all are different in some significant ways suggests that there is some both an underlying need for a file locking mechanism but with a lack of consensus about the best way to implement the mother-of-all-file-locking schemes for Python. Maybe the best place for this is in the distribution. PEP?
I don't think any one solution will work for everybody. I'm not even sure we can define a common API a la the DBAPI, but if something were to make it into the standard distribution, that's the direction I'd go in. Then we can provide various implementations that support the LockingAPI under various environments, constraints, and platforms. If we wanted to distribute them in the stdlib, we could put them all in a package and let the user decide which features they need.
I'm still planning on de-Mailman-ifying LockFile.py sometime soon.
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 26 2007)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611

Barry> I don't think any one solution will work for everybody. I'm not Barry> even sure we can define a common API a la the DBAPI, but if Barry> something were to make it into the standard distribution, that's Barry> the direction I'd go in. I've been working on a lockfile module the past few days on the train. I have something that passes a boatload of doctest test cases on my Mac, works for threads as well as processes (acquire the lock in thread one, then block in thread two until thread one releases the lock). The Unix version relies on the atomic nature of the link(2) system call. The Windows version (not yet tested on that platform) relies on mkdir(2) being atomic. (I think it is.) In theory, I suppose the mkdir version will work for both platforms, so it's possible that you could have file locking between Windows and Unix should you want it. The current implementation provides a FileLock class with these methods: acquire release is_locked break_lock __enter__ __exit__ The acquire method takes an optional timeout parameter (None => wait forever, +ive value => block for awhile, zero or -ive => give up immediately). The others all take no arguments. I'm working on ReST documentation now and hope to have that finished over the weekend. After that I'll write a simple setup.py, toss it out on PyPI, then announce it more broadly. If anyone would like to try it out and/or review the code sooner (especially if you have access to Windows) let me know. I'll shoot you a copy as it currently exists. The API and almost all test cases are defined in a _FileLock base class. You could (in theory at least) subclass it to provide locking through some other shared resource like a database and not have to write and or many other test cases. Skip

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Oct 26, 2007, at 4:10 PM, skip@pobox.com wrote:
Barry> I don't think any one solution will work for everybody. I'm not Barry> even sure we can define a common API a la the DBAPI, but if Barry> something were to make it into the standard distribution, that's Barry> the direction I'd go in.
I've been working on a lockfile module the past few days on the train. I have something that passes a boatload of doctest test cases on my Mac, works for threads as well as processes (acquire the lock in thread one, then block in thread two until thread one releases the lock). The Unix version relies on the atomic nature of the link(2) system call. The Windows version (not yet tested on that platform) relies on mkdir(2) being atomic. (I think it is.) In theory, I suppose the mkdir version will work for both platforms, so it's possible that you could have file locking between Windows and Unix should you want it.
The current implementation provides a FileLock class with these methods:
acquire release is_locked break_lock __enter__ __exit__
The acquire method takes an optional timeout parameter (None => wait forever, +ive value => block for awhile, zero or -ive => give up immediately). The others all take no arguments. I'm working on ReST documentation now and hope to have that finished over the weekend. After that I'll write a simple setup.py, toss it out on PyPI, then announce it more broadly. If anyone would like to try it out and/or review the code sooner (especially if you have access to Windows) let me know. I'll shoot you a copy as it currently exists.
The API and almost all test cases are defined in a _FileLock base class. You could (in theory at least) subclass it to provide locking through some other shared resource like a database and not have to write and or many other test cases.
It sounds pretty interesting Skip, though I won't have time to look at that for a couple of weeks. I did throw mine into PyPI: http://pypi.python.org/pypi/locknix/1.0 Mostly did it just to get it off my plate. You're being more diligent with tests and documentation than I am, and if your stuff works on Windows too, more thorough in your implementation. There is one feature locknix has which probably isn't very interesting to other applications, and that is the ability to transfer a lock between processes across a fork. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRyXuiHEjvBPtnXfVAQIEgwP/dn7GXdT5tV3PGpVtU09nkfSPdRTBQQSz ZZ3wFbSH6J+zMlM+esHLUBm+ZD1cqrlOLLvUIs7J5eRdryF3tRFjbZXqWV+a9/f+ gfLRS0E/80bX5EoJkGesRURtncY00E/cLGdTok4M1ZAWi7oO+K5l812uvrZzblOm 2hb7N/j1zYM= =0Xq0 -----END PGP SIGNATURE-----

> The API and almost all test cases are defined in a _FileLock base > class. You could (in theory at least) subclass it to provide locking > through some other shared resource like a database and not have to > write and or many other test cases. Okay, this is up on my website: http://www.webfast.com/~skip/python/ It took me a little longer to implement than I thought because I decided to implement an SQLite-based _FileLock subclass, mostly as a proof-of-concept. I'm still waiting for the name "lockfile" to free up in PyPI to put it there. Skip

skip> Okay, this is up on my website: skip> http://www.webfast.com/~skip/python/ And on PyPI: http://pypi.python.org/pypi/lockfile/ Skip

skip@pobox.com wrote:
This interface follows the completely stupid semantics of System V and IEEE Std 1003.1-1988 (``POSIX.1'') that require that all locks associated with a file for a given process are removed when any file descriptor for that file is closed by that process.... Flock(2) is recommended for applications that want to ensure the integrity of their locks when using library routines or wish to pass locks to their children.
I guess the BSD folks were a bit upset when they wrote that.
That sounds more like something written by the GNU folks than the BSD folks. -- Greg

On Tue, Oct 23, 2007 at 12:16:41PM +1300, Greg Ewing wrote:
skip@pobox.com wrote:
This interface follows the completely stupid semantics of System V and IEEE Std 1003.1-1988 (``POSIX.1'') that require that all locks associated with a file for a given process are removed when any file descriptor for that file is closed by that process.... Flock(2) is recommended for applications that want to ensure the integrity of their locks when using library routines or wish to pass locks to their children.
I guess the BSD folks were a bit upset when they wrote that.
That sounds more like something written by the GNU folks than the BSD folks.
It's from BSD. The earliest I can find it is 4.4BSD. It's still in the latest versions of OpenBSD, NetBSD and FreeBSD.

skip@pobox.com wrote:
Does fcntl.flock work over NFS and SMB and on Windows?
I don't think file locking will ever work over NFS, since it's a stateless protocol by design, and locking would require maintaining state on the server. -- Greg

On Tue, Oct 23, 2007 at 12:29:35PM +1300, Greg Ewing wrote:
skip@pobox.com wrote:
Does fcntl.flock work over NFS and SMB and on Windows?
I don't think file locking will ever work over NFS, since it's a stateless protocol by design, and locking would require maintaining state on the server.
You can do file locking over NFS, that's one of the reasons people use fcntl. It uses an RPC side channel separate to the main NFS protocol.

On Tue, 23 Oct 2007 01:11:39 +0100, Jon Ribbens <jon+python-dev@unequivocal.co.uk> wrote:
On Tue, Oct 23, 2007 at 12:29:35PM +1300, Greg Ewing wrote:
skip@pobox.com wrote:
Does fcntl.flock work over NFS and SMB and on Windows?
I don't think file locking will ever work over NFS, since it's a stateless protocol by design, and locking would require maintaining state on the server.
You can do file locking over NFS, that's one of the reasons people use fcntl. It uses an RPC side channel separate to the main NFS protocol.
You can do it. It just doesn't work. (You could say the same about regular read and write operations for many NFS implementations, though) Jean-Paul
participants (8)
-
"Martin v. Löwis"
-
Barry Warsaw
-
Benji York
-
Greg Ewing
-
Jean-Paul Calderone
-
Jon Ribbens
-
M.-A. Lemburg
-
skip@pobox.com