Re: [Mailman-Developers] stalled connection: archive locks?
[Ricardo Kustner]
Hi,
well i've traced my problem back to the fact that the lock files in the archive directory are not being removed.
they appear as: 1999-May-subject.lock.<host-name>.<#####>
where <#####> is a number...
The number is the process ID (pid) of the locking process. If you do a "ps auxww | grep #####" (substituting "#####" with a number from a lock file), does any Mailman processes show up?
does anybody have the same problems?
i think this might be what causes the process-overload on our server, when an impatient moderator doesnt wait for the stalled page to completely finnish and approve several posts at once, all of them waiting for lock files to be released...?
I'll look into it -- this may be related to a problem J C Lawrence has reported (with subject "Performance under load (bursty message flow)").
Harald
Hi,
On 22-May-99 Harald Meland wrote:
The number is the process ID (pid) of the locking process. If you do a "ps auxww | grep #####" (substituting "#####" with a number from a lock file), does any Mailman processes show up?
i tried once to look for that pid but i couldn't find it in ps... i'll try it again.
Looks like one of your pipermail archive databases (located under ~mailman/archive/private/LISTNAME/database/) is corrupt. Sorry to say, I don't know very much about these database files -- but, if you have a plain mbox archive of the list as well (located in ~mailman/archives/private/LISTNAME.mbox/LISTNAME.mbox), I guess the
thanks for the help... i've re-archived the mailinglist mbox, but now i'm getting errors in other places... to be honest, my mailman setup is a bit messy because i had to hurry with switching from majordomo and the permission problems were taking too much time.... it's really confusing that several different uid's on the server use the system (apache, exim and mailman through cron)... maybe i need to start cleaning it up ... if only there was some information somewhere on what exactly needs to be the permission settings for every directory/file (most of my directories are SGID)... i'd give my right arm for a check_permissions.sh script ;)
ps: don't you all agree mailman needs more publicity? Well, if it's _good_ publicity, then yes, of course :) at first i subscribed to the digest of mailman-users cause i was afraid i would get too much email...but it's nothing i can't handle...
Ricardo.
Hi,
First of all the connection stalls (i see a zombie python hanging around then btw) *after* an approved post has been submitted *and* the next page has been completely build on the screen (accept for the fact that the browser is still expecting data)... when i look at the source of ~mailman/Mailman/Cgi/admindb.py i see this :
PrintRequests(doc)
text = doc.Format(bgcolor="#ffffff")
print text
sys.stdout.flush()
finally:
list.Unlock()
so print text seems to be ok... so it either hangs in the flush() or maybe the unlock fails? any hints on debugging this thing?
Thanks, Ricardo.
Hi,
On 22-May-99 Harald Meland wrote:
i think this might be what causes the process-overload on our server, when an impatient moderator doesnt wait for the stalled page to completely finnish and approve several posts at once, all of them waiting for lock files to be released...? I'll look into it -- this may be related to a problem J C Lawrence has reported (with subject "Performance under load (bursty message flow)").
I guess so -- it happened again this morning when a moderator was approving some posts, the server go a unbelievable high load: it took me about 30 minutes to get into a telnet session and issue a shutdown :( Everytime a post is approved, the connection stalls exactly 15 seconds (which happens to be exactly the timeout of the lockfiles in mailman)... usually when i had to do a shutdown, i have to replace ~mailman/lists/list-name/config.db with config.db.latest because it has become corrupted ("no such list") anyway, the server also handles mail and is supposed to be up 24h a day and i hope i can fix this soon cause this is getting unworkable :(... and this way i never get a cool uptime :)
ps: sorry for cross-posting in mailman-users and develop ...
Ricardo.
Hi,
On 26-May-99 Ricardo Kustner wrote:
I'll look into it -- this may be related to a problem J C Lawrence has reported (with subject "Performance under load (bursty message flow)"). I guess so -- it happened again this morning when a moderator was approving some posts, the server go a unbelievable high load: it took me about 30 minutes to
well... i just disabled mail archiving and now the stalled connection dropped back from 15 seconds to about 4... seems an improvement, but it still doesnt look right and i still have a "<zombie>" python process flying around during these few seconds... i wish mailman had some debug flag so I could see at what point it is "stalling"...
Ricardo.
participants (2)
-
Harald Meland
-
Ricardo Kustner