
On Tue, Apr 24, 2001 at 08:29:35AM -0700, Steve Pirk wrote:
I would do some tests on basic mail... Telent to the box on port 25 and see how long it is before you get the greeting. Do the smae on the box to an outside machine. Both should be *very* fast. If they are not, look into reverse resolution of the ip address of the mailman box.
qrunner can only deliver mail as fast as the mta, so look there also.
Thanks for the advice. I tried the following from the machine itself (note that the line wrapping is false):
babel:/tmp> time ((echo 'helo me' ; echo 'mail from: mac@fysh.org' ; echo 'rcpt to: mac@empeg.com' ; echo 'data' ; echo 'From: mac@fysh.org' ; echo 'To: mac@empeg.com' ; echo 'Subject: wibble' ; echo ; echo "wibble" ; echo "." ; echo "quit") | telnet localhost 25) Trying 127.0.0.1... Connected to localhost.fysh.org. Escape character is '^]'. 220 babel.fysh.org ESMTP Exim 3.12 #1 Wed, 25 Apr 2001 11:10:51 +0100 Connection closed by foreign host. ( ( echo 'helo me'; echo 'mail from: mac@fysh.org'; echo ; echo 'data'; echo 0.01s user 0.02s system 8% cpu 0.352 total
I tried the same thing from another machine on a different network:
mac@morrison:~$ time ((echo 'helo me' ; echo 'mail from: mac@empeg.com' ; echo 'rcpt to: mac@babel.fysh.org' ; echo 'data' ; echo 'From: mac@empeg.com' ; echo 'To: mac@babel.fysh.org' ; echo 'Subject: wibble' ; echo ; echo "wibble" ; echo "." ; echo "quit") | telnet babel.fysh.org 25) Trying 193.119.19.190... Connected to babel.fysh.org. Escape character is '^]'. Connection closed by foreign host.
real 0m0.060s user 0m0.030s sys 0m0.030s
It looks pretty good.
In any case, if the MTA was the problem I wouldn't have expected the qrunner process to be using lots of CPU - surely it would just be blocked consuming nothing?
This morning I discovered that the qrunner process had taken over 300 minutes of CPU time and no mailing list traffic was being sent. Clearly it had got stuck somewhere that its 15 minute timeout didn't work. I killed the process and the web interface started working again. I don't think mail has started going out yet because my load is still quite low - I'm off on a lockfile hunt :-) There was nothing revealing in the qrunner log.
As you can see I'm running exim. I've read the README.EXIM file but I don't think it applies since I'm not trying to host lists on multiple domains. My exim.conf does not contain recipients_max so the default value of zero is being used.
TBH I'd like to just go back to Mailman 1 at this point but I'm worried that the database files will be incompatible.
-- Mike Crowe <mac@fysh.org>