[Mailman-Users] Mailman b5 lockup

Zeljko Vrba zvrba at iskon.hr
Tue Sep 5 08:46:33 CEST 2000


On Mon, Sep 04, 2000 at 01:08:15PM -0400, David Gilbert wrote:
> >>>>> "Zeljko" == Zeljko Vrba <zvrba at iskon.hr> writes:
> 
> Zeljko> I'm experiencing lockups with qrunner process; this is the
> Zeljko> strace session: send(6, "ehlo inje.iskon.hr\r\n", 20, 0) = 20
> Zeljko> read(7, "250-inje.iskon.hr Hello root at loc"..., 4096) = 155
> Zeljko> send(6, "mail FROM:<ex-yu-a-lista-admin at z"..., 53, 0) = 53
> Zeljko> read(7, "250 <ex-yu-a-lista-admin at zamir.n"..., 4096) = 50
> Zeljko> send(6, "rcpt TO:<oliver.sertic at fpzg.hr>\r"..., 33, 0) = 33
> Zeljko> read(7,
> 
> Zeljko> It blocks indefinetly in this last read. There exists no MX
> Zeljko> record for the fpzg.hr domain. Is this a bug in mailman,
> Zeljko> sendmail or both? Or BIND?  I'm using mailman as a
> Zeljko> production-level mailing list system and this is pretty
> Zeljko> urgent. Please help!
> 
> That's definately different to my experience.  I'm getting a qrunner
> lockup with 0 bytes transmitted and 0 bytes received.
> 
> I'm thinking that it's some interaction with FreeBSD, but I can't nail 
> it down.
> 
I'm running it on Linux. However, it's not a lockup. After a timeout period
(which is min(connect timeout,sendmail configured timeout)) it gives up
and proceeds to send other messages. Undelivered messages stay in the queue.
And this is my problem: undelivered messages pile up, and sometimes qrunner
takes hours to deliver all messages. I've experienced that the message was
sent at 10:15 and was delivered at 15:32. All members on the list are local
to the machine on which Mailman is running.

Temporary solution is to disable delivery to people which are subscribed from
broken domains; however with 80 lists this is tedious, slow and error-prone.

Now I'm going to dig into the sources and possibly implement a timeout when
communicating with sendmail.

The best solution would be that qrunner remebers broken domains: that way it
would only wait for the first message to such a domain; for other messages it
wouldn't even try delivery (in one run). So the "lockup" wouldn't last Q*3min
(Q is the total number of people from broken domains), but only D*3min (the
number of different broken domains). In my case Q > 20, D=2.





More information about the Mailman-Users mailing list