Re: [Mailman-Developers] race condition in locking ?
![](https://secure.gravatar.com/avatar/fd4bd17264b01a28529e408abc3c7156.jpg?s=120&d=mm&r=g)
On Sat, Feb 05, 2000 at 08:56:53PM +0100, Ricardo Kustner wrote (in private):
Traceback (innermost last): File "/usr/local/mailman/scripts/driver", line 112, in run_main main() File "../Mailman/Cgi/admindb.py", line 123, in main mlist.Save() File "/usr/local/mailman/Mailman/MailList.py", line 819, in Save self.SaveRequestsDb() File "../Mailman/ListAdmin.py", line 90, in SaveRequestsDb self.__closedb() File "../Mailman/ListAdmin.py", line 74, in __closedb assert self.Locked() AssertionError:
if you need to see the logs/error or logs/lock files let me know...
[ CC: the list, because of the questions at the bottom ]
Hmmm. I think I found the problem, but I'm not entirely sure. It looks like the default MailList lock timeout is 60 seconds, and that that is too short for your machine. You should see some 'stolen!' or (in my LockFile version) 'broken!' messages, in your logs/locks file.
In any case, you can easily try it out; in Mailman/MailList.py, on or around line 282, there should be a 'lifetime = 60', inside the constructor for the maillists' lockfile. Changing the '60' in, say, '600', should give you better mileage, at least until your machine gets so heavily loaded that a simple admin request takes ten full minutes to process ;)
I'm not sure what the Right Fix is, however. MailList.MailList does not have an advertised way of setting the lock lifetime, nor of refresh()ing the lock. Maybe someone with more experience (barry ? any other active developers) can shed light here... Is the usual solution to raise the initial lock timeout, passing lock timeout in the constructor, adding the two functions and calling them where appropriate, or just using __lock directly ?
BTW, whoever documented that part of the code, thank you ;) with the 'TBD' remark above the lockfile constructor this was a sucker to find. It really means a lot to have it so well documented.
-- Thomas Wouters <thomas@xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
![](https://secure.gravatar.com/avatar/fd4bd17264b01a28529e408abc3c7156.jpg?s=120&d=mm&r=g)
On Sat, Feb 05, 2000 at 11:21:30PM +0100, Thomas Wouters wrote:
You should see some 'stolen!' or (in my LockFile version) 'broken!' messages, in your logs/locks file.
Er, apparently not. Somehow some logmessages got lost during the move from lock() to __break() ;P If you want to be sure, you can add something like this by hand: self.__writelog("stealing lock (from %s)"%winner)
As for the locking problem, I think I see a better solution. A site-global 'lock lifetime multiplier' that the site admin can set, by which both the lifetime values of lockfiles and the lock timeout values given to LockFile.lock() are multiplied. By leaving the default to 1 we leave everything the same, but when a slow site is experiencing expired but still used locks, they can up the multiplier and get consistent behaviour all 'round -- no spurious locktimeout errors because lifetime is high but timeout isn't :P
-- Thomas Wouters <thomas@xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
![](https://secure.gravatar.com/avatar/d10cad24a9804753ca20a8ad2d2cc593.jpg?s=120&d=mm&r=g)
On Sat, Feb 05, 2000 at 11:21:30PM +0100, Thomas Wouters wrote:
AssertionError: if you need to see the logs/error or logs/lock files let me know... [ CC: the list, because of the questions at the bottom ] Hmmm. I think I found the problem, but I'm not entirely sure. It looks like
On Sat, Feb 05, 2000 at 08:56:53PM +0100, Ricardo Kustner wrote (in private): the default MailList lock timeout is 60 seconds, and that that is too short for your machine. You should see some 'stolen!' or (in my LockFile version) 'broken!' messages, in your logs/locks file. yes i did see "stolen!" in the logs just before the process was about to crash.
In any case, you can easily try it out; in Mailman/MailList.py, on or around line 282, there should be a 'lifetime = 60', inside the constructor for the maillists' lockfile. Changing the '60' in, say, '600', should give you
thanks a lot :) I'm trying out the value '300'... just to be sure...
better mileage, at least until your machine gets so heavily loaded that a simple admin request takes ten full minutes to process ;) with the earlier versions of MM, first I used exim and that made the load way to high (around 60) and often made it impossible to use... after I started using postfix, things went much better.
I'm not sure what the Right Fix is, however. MailList.MailList does not have
I'll let you know how it works out after changing the lock timeout...
Ricardo.
--
![](https://secure.gravatar.com/avatar/d10cad24a9804753ca20a8ad2d2cc593.jpg?s=120&d=mm&r=g)
On Mon, Feb 07, 2000 at 08:43:31PM +0100, Ricardo Kustner wrote:
In any case, you can easily try it out; in Mailman/MailList.py, on or around line 282, there should be a 'lifetime = 60', inside the constructor for the maillists' lockfile. Changing the '60' in, say, '600', should give you
yes i did see "stolen!" in the logs just before the process was about to crash. thanks a lot :) I'm trying out the value '300'... just to be sure...
well I just tried it out... I approved 17 posts and didn't get an assertion error! though it took quite a while before the webpage was finnished loading (about 2 minutes) but the process load on the machine didn't get any higher than 2.5
I wonder though what happens if some impatient moderator decides not to wait before the page finnishes loading, switches to a differen webpage and therefor breaks the python cgi process... will some approved posts stay in the queue instead?
I mentioned before that I think that it could be better if the cgi scripts don't do anything more than just "mark" message as being approved... cgi scripts should have a short life time and definately shouldn't be waiting for something too long...
btw i used the original LockFile.py this time...
Ricardo.
--
![](https://secure.gravatar.com/avatar/fd4bd17264b01a28529e408abc3c7156.jpg?s=120&d=mm&r=g)
On Mon, Feb 07, 2000 at 09:31:44PM +0100, Ricardo Kustner wrote:
I wonder though what happens if some impatient moderator decides not to wait before the page finnishes loading, switches to a differen webpage and therefor breaks the python cgi process... will some approved posts stay in the queue instead?
Try it out ! ;) It depends on the exact behaviour of both the webbrowser and the webserver. I haven't checked the Mailman code but i assume it either continues with its jobs until it tries to write output, usually at the end of the script, or it gets a signal and cleans up nicely. Postings shouldn't disappear, if that's what you mean.
I mentioned before that I think that it could be better if the cgi scripts don't do anything more than just "mark" message as being approved... cgi scripts should have a short life time and definately shouldn't be waiting for something too long...
I agree, and I think I've seen more comments and postings talking about it. Noone has implemented it yet, though, and the current method works too well for it to be a rela problem, I guess ;)
btw i used the original LockFile.py this time...
Go ahead and try it with mine -- it should make the waits you have to endure a tad shorter. The problem with the old locking mechanism is that a lot of processes end up butting heads in a painful way when they try to grab the lock. In the modified one, the first process to try and get the lock after it is freed, actually gets the lock.
-- Thomas Wouters <thomas@xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
![](https://secure.gravatar.com/avatar/d10cad24a9804753ca20a8ad2d2cc593.jpg?s=120&d=mm&r=g)
I wonder though what happens if some impatient moderator decides not to wait before the page finnishes loading, switches to a differen webpage and therefor breaks the python cgi process... will some approved posts stay in the queue instead? Try it out ! ;) It depends on the exact behaviour of both the webbrowser and
On Mon, Feb 07, 2000 at 09:31:44PM +0100, Ricardo Kustner wrote: the webserver. I haven't checked the Mailman code but i assume it either continues with its jobs until it tries to write output, usually at the end of the script, or it gets a signal and cleans up nicely. if you approve a bunch of posts it's a bit difficult work to figure out which
On Mon, Feb 07, 2000 at 10:59:07PM +0100, Thomas Wouters wrote: posts arrive... especially since my server needs a bit of time to have send out all the posts... i never really timed how long it takes before all posts are send out (i don't care that much about that though... people need to wait anyway since the list is being moderated)
I mentioned before that I think that it could be better if the cgi scripts don't do anything more than just "mark" message as being approved... cgi scripts should have a short life time and definately shouldn't be waiting for something too long... I agree, and I think I've seen more comments and postings talking about it. Noone has implemented it yet, though, and the current method works too well for it to be a rela problem, I guess ;)
it's a big improvement that 1.2 now keeps the pending posts outside of the config.db, so i'm happy with that ... i have been one of the people begging for that :)
btw i used the original LockFile.py this time... Go ahead and try it with mine -- it should make the waits you have to endure could you send me your latest version (complete file is ok, and a bit easier for me :) ) thanks...
Ricardo.
--
participants (2)
-
Ricardo Kustner
-
Thomas Wouters