Another issue with mailman locking - i think it's coarse grained, and needs to be refined. For example, this morning someone noticed that the umbrella listinfo page on python.org was hanging, because, it turns out, one of the lists was left in a locked state. (Not sure how that happened - it usually doesn't.) Well, the umbrella listinfo (and admin) pages open the lists only for the sake of determining which are public ones - no writing is actually required, so a lock isn't really required.
It might be sufficient to offer a way to open a list "read-only", such that no locking is done. I think it would be better, however, to refine the locking system such that lists are not locked until they enter an operation that changes the list data in a way that will eventually need to be written. This would probably take a more effort than a "read-only" kluge, but may not be too bad.
I guess this is one for the todo list. (Wish i had gobs of time to continue to hack on a these sorts of things - they're not very big or hard, but there is a bunch of them, and they need to be done right.)
Ken
Well, if one of the lists was left unlocked, there was probably a bug... Anyway, WRT. locking, I think that several things need to happen minimum:
Locks should insert a time stamp. After x seconds (30?) you can lose the lock. Anything such as the archiving software should use that time to copy the archive over to a tmp file and nulling out the old file so that other procs can overwrite (Not that hard).
A version of Load() should be written that doesn't call Lock() (call it LoadUnlocked() I guess). Change things that don't care to use it (Even easier).
Make sure Lock() uses portable calls (might require a bit of research).
I can look into this stuff next week, but if anyone wants to do it sooner, let us know and feel free...
John
On Tue, May 05, 1998 at 03:28:28PM -0400, Ken Manheimer wrote:
Another issue with mailman locking - i think it's coarse grained, and needs to be refined. For example, this morning someone noticed that the umbrella listinfo page on python.org was hanging, because, it turns out, one of the lists was left in a locked state. (Not sure how that happened - it usually doesn't.) Well, the umbrella listinfo (and admin) pages open the lists only for the sake of determining which are public ones - no writing is actually required, so a lock isn't really required.
It might be sufficient to offer a way to open a list "read-only", such that no locking is done. I think it would be better, however, to refine the locking system such that lists are not locked until they enter an operation that changes the list data in a way that will eventually need to be written. This would probably take a more effort than a "read-only" kluge, but may not be too bad.
I guess this is one for the todo list. (Wish i had gobs of time to continue to hack on a these sorts of things - they're not very big or hard, but there is a bunch of them, and they need to be done right.)
Ken
Mailman-developers maillist - Mailman-developers@python.org http://www.python.org/mailman/listinfo/mailman-developers
"KM" == Ken Manheimer <klm@python.org> writes:
KM> It might be sufficient to offer a way to open a list
KM> "read-only", such that no locking is done. I think it would
KM> be better, however, to refine the locking system such that
KM> lists are not locked until they enter an operation that
KM> changes the list data in a way that will eventually need to be
KM> written. This would probably take a more effort than a
KM> "read-only" kluge, but may not be too bad.
Of course, much of these file-locking problems go away once you have a long-lived mailman server. Then again, you'll have threading issues to deal with :-).
I haven't looked at what it is that Mailman is actually trying to lock (too busy interspersing autoconf hacking time in with Real Work), but I'm guess from what I've read that you're just trying to keep multiple writers from clobbering each other (and readers from reading inconsistent data when a writer is writing). It sounds like you don't really need to lock the file, but instead to lock out access to a shared resource, and you want this to work across NFS.
I wasn't able to dig up the Netscape Mail + File Locking article that Jamie Zawinski wrote. I don't seem to have it in my bookmarks or easily grepable and a web search turned up only stale links. I emailed Jaime to see if he's got an updated link and will forward that if I get a response.
If I remember correctly, the article basically justifies the point of view that the best portable way to do resource locking across NFS is by creating a file with O_CREAT | O_EXCL. That's what's meant by "dot-file" locking because you usually create a .file that is mostly invisible. Since the check-for-existance and file-creation steps are atomic (on Solaris at least), you won't every have two processes step on each other. The additional advantage is that just rm'ing the dot-file is enough to unlock the system. The disadvantage is that this operation may not be atomic for older versions of NFS, although I believe it is for NFS 3 (can't find a reference to this right now though).
On the other hand, creation of directories *is* atomic across all versions of NFS, AFAIK, so perhaps it makes more sense to create a dot-directory that locks the resource. You'd have to use mkdir() and check for EEXIST. You'd basically busy-loop in either case waiting for the lock to be given up.
So creating a dot-directory representing a lock is probably the most portable, safest, and easiest way to do it.
-Barry
Barry A. Warsaw wrote: [a lot :]
On the other hand, creation of directories *is* atomic across all versions of NFS, AFAIK, so perhaps it makes more sense to create a dot-directory that locks the resource. You'd have to use mkdir() and check for EEXIST. You'd basically busy-loop in either case waiting for the lock to be given up.
So creating a dot-directory representing a lock is probably the most portable, safest, and easiest way to do it.
Another easy way is to simply use the "lockfile" utility which comes with procmail. Procmail has its own install script to check for the locking abilities of the particular installation and does intense testing, and I never got any problems with it when I was using smartlist, which is a procmail based minidomo. (BTW, reading the smartlist source can cause serious damage to your brain, it is write-only-coded procmail and shell).
If one needs a quick and immediately working solution, I can recommend this, at least give it a slot in your "tips&tricks" box.
-- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgpkeys.mit.edu we're tired of banana software - shipped green, ripens at home
On Tue, May 05, 1998 at 03:45:47PM -0400, Barry A. Warsaw wrote:
I wasn't able to dig up the Netscape Mail + File Locking article that Jamie Zawinski wrote. I don't seem to have it in my bookmarks or easily grepable and a web search turned up only stale links. I emailed Jaime to see if he's got an updated link and will forward that if I get a response.
Hmm, I got it on a web search: http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/movemail.html
If I remember correctly, the article basically justifies the point of view that the best portable way to do resource locking across NFS is by creating a file with O_CREAT | O_EXCL. That's what's meant by "dot-file" locking because you usually create a .file that is mostly invisible.
I have yet to find much on whether there are race conditions w/ O_EXCL and directories. However, the open() man page on my system has the following to say:
O_EXCL When used with O_CREAT, if the file already exists
it is an error and the open will fail. O_EXCL is
broken on NFS file systems, programs which rely on
it for performing locking tasks will contain a race
condition. The solution for performing atomic file
locking using a lockfile is to create a unique file
on the same fs (e.g., incorporating hostname and
pid), use link(2) to make a link to the lockfile
and use stat(2) on the unique file to check if its
link count has increased to 2. Do not use the
return value of the link() call.
The problem with this solution is, what happens if a process dies while holding a lock? The most straightforward solution is to drop a pid into the lock file, and have the process wishing to grab the lock check to see if the process is still running.
Also, this is a polling-based solution, which isn't incredibly desirable either... But I guess there's no way around that one.
Ok, I'll implement a general library call for this one, I guess.
BTW, No one has put any effort into making pipermail portable yet, have they?
John
participants (4)
-
Barry A. Warsaw
-
Christian Tismer
-
John Viega
-
Ken Manheimer