Re: "could not acquire qrunner lock", etc
![](https://secure.gravatar.com/avatar/4b66ff412ba302c074fe78b974445f40.jpg?s=120&d=mm&r=g)
On 23 March 2001, Bill Bradford wrote:
I'm suffering from a similar problem. I run a list with a few hundred members that has a handful of posts an hour throughout most of the day. About thirty people are on the digest list.
I'm running the Debian unstable mailman package 2.0.3-7 recompiled myself on a stable (Debian 2.2) system. I contacted the maintainer of the package and he suggested I post here. The machine is only a P133 with 96Mb of RAM but I would hope it should handle such a list.
The mailing lists are running fine, mails just take a long time (several hours) to pass through the system and it is putting an unreasonable load on the machine in my opinion. It has a knock-on effect of making the web interface unusable because the qrunner process perpetually has the list locked (maybe this is a clue).
The python qrunner process seems to be running most of the time pushing my load average above one constantly - it runs for its fifteen minutes and then gives up. At some points during a lull of several hours in mailing list activity it will stop of its own accord and everything will settle down.
Dave Klingler wrote in response to Bill Bradford:
If permissions were the case surely it wouldn't work at all? I've had a look through and stuff generally seems to be owned by user list, group list or owned by root, group list with g+rwX permissions.
I tried putting some extra logging in the qrunner script but wasn't really sure what I was looking at. If someone can advise me where it would be best to log then I am willing to hack the scripts a bit.
TIA
-- Mike Crowe <mac@fysh.org>
![](https://secure.gravatar.com/avatar/7f4bd9ce4ade62c03496db4364de5008.jpg?s=120&d=mm&r=g)
I would do some tests on basic mail... Telent to the box on port 25 and see how long it is before you get the greeting. Do the smae on the box to an outside machine. Both should be *very* fast. If they are not, look into reverse resolution of the ip address of the mailman box.
qrunner can only deliver mail as fast as the mta, so look there also.
Steve
Steve Pirk orion@deathcon.com . deathcon.com . pirk.com . webops.com . t2servers.com
On Tue, 24 Apr 2001, Mike Crowe wrote:
![](https://secure.gravatar.com/avatar/4b66ff412ba302c074fe78b974445f40.jpg?s=120&d=mm&r=g)
On Tue, Apr 24, 2001 at 08:29:35AM -0700, Steve Pirk wrote:
qrunner can only deliver mail as fast as the mta, so look there also.
Thanks for the advice. I tried the following from the machine itself (note that the line wrapping is false):
babel:/tmp> time ((echo 'helo me' ; echo 'mail from: mac@fysh.org' ; echo 'rcpt to: mac@empeg.com' ; echo 'data' ; echo 'From: mac@fysh.org' ; echo 'To: mac@empeg.com' ; echo 'Subject: wibble' ; echo ; echo "wibble" ; echo "." ; echo "quit") | telnet localhost 25) Trying 127.0.0.1... Connected to localhost.fysh.org. Escape character is '^]'. 220 babel.fysh.org ESMTP Exim 3.12 #1 Wed, 25 Apr 2001 11:10:51 +0100 Connection closed by foreign host. ( ( echo 'helo me'; echo 'mail from: mac@fysh.org'; echo ; echo 'data'; echo 0.01s user 0.02s system 8% cpu 0.352 total
I tried the same thing from another machine on a different network:
mac@morrison:~$ time ((echo 'helo me' ; echo 'mail from: mac@empeg.com' ; echo 'rcpt to: mac@babel.fysh.org' ; echo 'data' ; echo 'From: mac@empeg.com' ; echo 'To: mac@babel.fysh.org' ; echo 'Subject: wibble' ; echo ; echo "wibble" ; echo "." ; echo "quit") | telnet babel.fysh.org 25) Trying 193.119.19.190... Connected to babel.fysh.org. Escape character is '^]'. Connection closed by foreign host.
real 0m0.060s user 0m0.030s sys 0m0.030s
It looks pretty good.
In any case, if the MTA was the problem I wouldn't have expected the qrunner process to be using lots of CPU - surely it would just be blocked consuming nothing?
This morning I discovered that the qrunner process had taken over 300 minutes of CPU time and no mailing list traffic was being sent. Clearly it had got stuck somewhere that its 15 minute timeout didn't work. I killed the process and the web interface started working again. I don't think mail has started going out yet because my load is still quite low - I'm off on a lockfile hunt :-) There was nothing revealing in the qrunner log.
As you can see I'm running exim. I've read the README.EXIM file but I don't think it applies since I'm not trying to host lists on multiple domains. My exim.conf does not contain recipients_max so the default value of zero is being used.
TBH I'd like to just go back to Mailman 1 at this point but I'm worried that the database files will be incompatible.
-- Mike Crowe <mac@fysh.org>
![](https://secure.gravatar.com/avatar/4b66ff412ba302c074fe78b974445f40.jpg?s=120&d=mm&r=g)
On Wed, Apr 25, 2001 at 11:24:29AM +0100, I wrote:
I've done some further investigation myself by inserting syslog statements deep into the code until I could find the bit that is "hanging".
When qrunner starts it processed a few toadmin and tolist jobs fine. It is getting stuck on torequest jobs.
It turns out that someone had sent a huge MIME encoded attachment to the -request address. It got stuck inside MailCommandHandler::ParseMailCommands between the arrows appending the complete junk to the response email. I've commented out the loop since it probably isn't all that beneficial anyway. Maybe it would be a good idea to set an arbitrary limit on the number of lines that will be appended to avoid this problem happening? Alternatively, maybe messages longer than a certain number of lines should not even be processed? In its current form it acts as quite a good DoS on a mailman list.
if not self.__dispatch.has_key(cmd):
self.AddError(line, prefix='Command? ')
if self.__errors >= MAXERRORS:
self.AddError('\nToo many errors encountered; '
'the rest of the message is ignored:')
---> for line in lines[linecount+1:]: self.AddToResponse(line, trunc=0, prefix='> ') <---
break
I hope my understanding of this problem is correct and the above information is useful. I don't think I've got to the bottom of the CPU usage problem, but this was certainly part of it.
-- Mike Crowe <mac@fysh.org>
![](https://secure.gravatar.com/avatar/4b66ff412ba302c074fe78b974445f40.jpg?s=120&d=mm&r=g)
Continuing the bad netiquette of replying to myself...
On Sat, Apr 28, 2001 at 12:25:53PM +0100, I wrote:
I've discovered the cause of the CPU usage problem. I had ARCHIVE_TO_MBOX set to the default of 2. It appears that the archiver eats an awful lot of processor power - so much that the backlog caused by the earlier bug was taking ages to clear. The qrunner would sit in the second part of ToArchive::process for around fifteen minutes whilst holding onto the list lock which stopped the web interface from working.
Once I discovered that's where the problem was I set ARCHIVE_TO_MBOX to 1 and suddenly huge amounts of email that had been delayed for nearly a week started flooding through.
So, in summary, a P133 with 96Mb of RAM is not up to running even a relatively low volume list (50 messages per day) when ARCHIVE_TO_MBOX is set to 2. :-)
-- Mike Crowe <mac@fysh.org>
![](https://secure.gravatar.com/avatar/2f8c4ab510f5ffd1a03a6d85d6083503.jpg?s=120&d=mm&r=g)
On Sat, Apr 28, 2001 at 09:09:38PM +0100, Mike Crowe wrote:
Yep, this is known, I've had the same problem on several servers, including sourceforge.net, which only has 2G of memory a dual 800Mhz P3 or something, so don't feel inferior with your machine, it's the code that's at fault :-)
The good news is that the mm folks know about this and redesigned the queueing system to avoid this holdup. All that said, pipermail will still not be an archiver of choice, you are encouraged to use something else.
Empegs rule! :-) Marc
Microsoft is to operating systems & security .... .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | Finger marc_f@merlins.org for PGP key
![](https://secure.gravatar.com/avatar/4b66ff412ba302c074fe78b974445f40.jpg?s=120&d=mm&r=g)
On Wed, May 02, 2001 at 12:14:44AM -0700, Marc MERLIN wrote:
On Sat, Apr 28, 2001 at 09:09:38PM +0100, Mike Crowe wrote:
[stuff about the archiver being slow]
Ah, I'm glad about that!
Is anything recommended? Ideally something that will turn the mbox files into nice HTML every night without keeping the list locked.
Empegs rule! :-)
I thought the email address was familiar :-)
-- Mike Crowe <mac@fysh.org>
![](https://secure.gravatar.com/avatar/2f8c4ab510f5ffd1a03a6d85d6083503.jpg?s=120&d=mm&r=g)
On Wed, May 02, 2001 at 11:27:58AM +0100, Mike Crowe wrote:
That's what I thought of too, but there isn't any direct way to do this in mailman. That said, nothing prevents you from running the archiver code by hand against the mailbox (something like grepmail todaysdate mbox | formail -s scriptthatrunsarchiver)
Pipermail however does remain slow and inefficient (it can use huge amounts of CPU and RAM), and this is known, no one is actively maintaining the code and people who have real archiver needs are typically encouraged to use monharc.
Cheers, Marc
Microsoft is to operating systems & security .... .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | Finger marc_f@merlins.org for PGP key
![](https://secure.gravatar.com/avatar/2f8c4ab510f5ffd1a03a6d85d6083503.jpg?s=120&d=mm&r=g)
On Sat, Apr 28, 2001 at 09:09:38PM +0100, Mike Crowe wrote:
Yep, this is known, I've had the same problem on several servers, including sourceforge.net, which only has 2G of memory a dual 800Mhz P3 or something, so don't feel inferior with your machine, it's the code that's at fault :-)
The good news is that the mm folks know about this and redesigned the queueing system to avoid this holdup. All that said, pipermail will still not be an archiver of choice, you are encouraged to use something else.
Empegs rule! :-) Marc
Microsoft is to operating systems & security .... .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | Finger marc_f@merlins.org for PGP key
![](https://secure.gravatar.com/avatar/7f4bd9ce4ade62c03496db4364de5008.jpg?s=120&d=mm&r=g)
I would do some tests on basic mail... Telent to the box on port 25 and see how long it is before you get the greeting. Do the smae on the box to an outside machine. Both should be *very* fast. If they are not, look into reverse resolution of the ip address of the mailman box.
qrunner can only deliver mail as fast as the mta, so look there also.
Steve
Steve Pirk orion@deathcon.com . deathcon.com . pirk.com . webops.com . t2servers.com
On Tue, 24 Apr 2001, Mike Crowe wrote:
![](https://secure.gravatar.com/avatar/4b66ff412ba302c074fe78b974445f40.jpg?s=120&d=mm&r=g)
On Tue, Apr 24, 2001 at 08:29:35AM -0700, Steve Pirk wrote:
qrunner can only deliver mail as fast as the mta, so look there also.
Thanks for the advice. I tried the following from the machine itself (note that the line wrapping is false):
babel:/tmp> time ((echo 'helo me' ; echo 'mail from: mac@fysh.org' ; echo 'rcpt to: mac@empeg.com' ; echo 'data' ; echo 'From: mac@fysh.org' ; echo 'To: mac@empeg.com' ; echo 'Subject: wibble' ; echo ; echo "wibble" ; echo "." ; echo "quit") | telnet localhost 25) Trying 127.0.0.1... Connected to localhost.fysh.org. Escape character is '^]'. 220 babel.fysh.org ESMTP Exim 3.12 #1 Wed, 25 Apr 2001 11:10:51 +0100 Connection closed by foreign host. ( ( echo 'helo me'; echo 'mail from: mac@fysh.org'; echo ; echo 'data'; echo 0.01s user 0.02s system 8% cpu 0.352 total
I tried the same thing from another machine on a different network:
mac@morrison:~$ time ((echo 'helo me' ; echo 'mail from: mac@empeg.com' ; echo 'rcpt to: mac@babel.fysh.org' ; echo 'data' ; echo 'From: mac@empeg.com' ; echo 'To: mac@babel.fysh.org' ; echo 'Subject: wibble' ; echo ; echo "wibble" ; echo "." ; echo "quit") | telnet babel.fysh.org 25) Trying 193.119.19.190... Connected to babel.fysh.org. Escape character is '^]'. Connection closed by foreign host.
real 0m0.060s user 0m0.030s sys 0m0.030s
It looks pretty good.
In any case, if the MTA was the problem I wouldn't have expected the qrunner process to be using lots of CPU - surely it would just be blocked consuming nothing?
This morning I discovered that the qrunner process had taken over 300 minutes of CPU time and no mailing list traffic was being sent. Clearly it had got stuck somewhere that its 15 minute timeout didn't work. I killed the process and the web interface started working again. I don't think mail has started going out yet because my load is still quite low - I'm off on a lockfile hunt :-) There was nothing revealing in the qrunner log.
As you can see I'm running exim. I've read the README.EXIM file but I don't think it applies since I'm not trying to host lists on multiple domains. My exim.conf does not contain recipients_max so the default value of zero is being used.
TBH I'd like to just go back to Mailman 1 at this point but I'm worried that the database files will be incompatible.
-- Mike Crowe <mac@fysh.org>
![](https://secure.gravatar.com/avatar/4b66ff412ba302c074fe78b974445f40.jpg?s=120&d=mm&r=g)
On Wed, Apr 25, 2001 at 11:24:29AM +0100, I wrote:
I've done some further investigation myself by inserting syslog statements deep into the code until I could find the bit that is "hanging".
When qrunner starts it processed a few toadmin and tolist jobs fine. It is getting stuck on torequest jobs.
It turns out that someone had sent a huge MIME encoded attachment to the -request address. It got stuck inside MailCommandHandler::ParseMailCommands between the arrows appending the complete junk to the response email. I've commented out the loop since it probably isn't all that beneficial anyway. Maybe it would be a good idea to set an arbitrary limit on the number of lines that will be appended to avoid this problem happening? Alternatively, maybe messages longer than a certain number of lines should not even be processed? In its current form it acts as quite a good DoS on a mailman list.
if not self.__dispatch.has_key(cmd):
self.AddError(line, prefix='Command? ')
if self.__errors >= MAXERRORS:
self.AddError('\nToo many errors encountered; '
'the rest of the message is ignored:')
---> for line in lines[linecount+1:]: self.AddToResponse(line, trunc=0, prefix='> ') <---
break
I hope my understanding of this problem is correct and the above information is useful. I don't think I've got to the bottom of the CPU usage problem, but this was certainly part of it.
-- Mike Crowe <mac@fysh.org>
![](https://secure.gravatar.com/avatar/4b66ff412ba302c074fe78b974445f40.jpg?s=120&d=mm&r=g)
Continuing the bad netiquette of replying to myself...
On Sat, Apr 28, 2001 at 12:25:53PM +0100, I wrote:
I've discovered the cause of the CPU usage problem. I had ARCHIVE_TO_MBOX set to the default of 2. It appears that the archiver eats an awful lot of processor power - so much that the backlog caused by the earlier bug was taking ages to clear. The qrunner would sit in the second part of ToArchive::process for around fifteen minutes whilst holding onto the list lock which stopped the web interface from working.
Once I discovered that's where the problem was I set ARCHIVE_TO_MBOX to 1 and suddenly huge amounts of email that had been delayed for nearly a week started flooding through.
So, in summary, a P133 with 96Mb of RAM is not up to running even a relatively low volume list (50 messages per day) when ARCHIVE_TO_MBOX is set to 2. :-)
-- Mike Crowe <mac@fysh.org>
![](https://secure.gravatar.com/avatar/2f8c4ab510f5ffd1a03a6d85d6083503.jpg?s=120&d=mm&r=g)
On Sat, Apr 28, 2001 at 09:09:38PM +0100, Mike Crowe wrote:
Yep, this is known, I've had the same problem on several servers, including sourceforge.net, which only has 2G of memory a dual 800Mhz P3 or something, so don't feel inferior with your machine, it's the code that's at fault :-)
The good news is that the mm folks know about this and redesigned the queueing system to avoid this holdup. All that said, pipermail will still not be an archiver of choice, you are encouraged to use something else.
Empegs rule! :-) Marc
Microsoft is to operating systems & security .... .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | Finger marc_f@merlins.org for PGP key
![](https://secure.gravatar.com/avatar/4b66ff412ba302c074fe78b974445f40.jpg?s=120&d=mm&r=g)
On Wed, May 02, 2001 at 12:14:44AM -0700, Marc MERLIN wrote:
On Sat, Apr 28, 2001 at 09:09:38PM +0100, Mike Crowe wrote:
[stuff about the archiver being slow]
Ah, I'm glad about that!
Is anything recommended? Ideally something that will turn the mbox files into nice HTML every night without keeping the list locked.
Empegs rule! :-)
I thought the email address was familiar :-)
-- Mike Crowe <mac@fysh.org>
![](https://secure.gravatar.com/avatar/2f8c4ab510f5ffd1a03a6d85d6083503.jpg?s=120&d=mm&r=g)
On Wed, May 02, 2001 at 11:27:58AM +0100, Mike Crowe wrote:
That's what I thought of too, but there isn't any direct way to do this in mailman. That said, nothing prevents you from running the archiver code by hand against the mailbox (something like grepmail todaysdate mbox | formail -s scriptthatrunsarchiver)
Pipermail however does remain slow and inefficient (it can use huge amounts of CPU and RAM), and this is known, no one is actively maintaining the code and people who have real archiver needs are typically encouraged to use monharc.
Cheers, Marc
Microsoft is to operating systems & security .... .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | Finger marc_f@merlins.org for PGP key
![](https://secure.gravatar.com/avatar/2f8c4ab510f5ffd1a03a6d85d6083503.jpg?s=120&d=mm&r=g)
On Sat, Apr 28, 2001 at 09:09:38PM +0100, Mike Crowe wrote:
Yep, this is known, I've had the same problem on several servers, including sourceforge.net, which only has 2G of memory a dual 800Mhz P3 or something, so don't feel inferior with your machine, it's the code that's at fault :-)
The good news is that the mm folks know about this and redesigned the queueing system to avoid this holdup. All that said, pipermail will still not be an archiver of choice, you are encouraged to use something else.
Empegs rule! :-) Marc
Microsoft is to operating systems & security .... .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | Finger marc_f@merlins.org for PGP key
participants (3)
-
Marc MERLIN
-
Mike Crowe
-
Steve Pirk