Mailman 3 shipstopper bug - Mailman-Developers

Jan. 9, 1999 · *local*


      I've reported this before, and I will reiterate it again. Moreover, I
would call it a "shipstopper" for the 1.0 release.
I'm using 1.0b6 here. This has hit me FIVE times in the past 24 hours.
Each time, I have to take manual action to fix things (as reported
before). Basically, shutting down the sendmail listener to stop the mail
loop, clearing the Mailman queue and sendmail queue of the loop-inducing
garbage, and then bringing it back up.
What happens is that somebody types in a request to the Mailman
subscription page with a *local* mail address (i.e. no domain name). The
confirmation email then gets sent. This bounces back to Mailman, but it
doesn't understand that it is a bounce and tries to process the dumb
thing. That fails massively (as seen in the attached mail), and responds
to mailer-daemon with the error. More on this in a bit.
What appears wonky is that Mailman seems to attempt to deliver the
confirmation over and over, nonstop.
Here is a portion of the mail log for RAB25492:
Jan  8 17:18:19 ns1 sendmail[25492]: RAA25492: <ch9517>... User unknown
Jan  8 17:18:19 ns1 sendmail[25492]: lost input channel from localhost
[127.0.0.1]
Jan  8 17:18:19 ns1 sendmail[25492]: RAA25492:
from=<hognews-request@eastsideharley.com>, size=0, class=0, pri=0,
nrcpts=0, proto=SMTP, relay=localhost [127.0.0.1]
Jan  8 17:18:19 ns1 sendmail[25492]: RAA25492: RAB25492: DSN:
<ch9517>... User unknown
Jan  8 17:18:27 ns1 sendmail[25492]: RAB25492:
to=|"/home/mailman/install/mail/wrapper mailcmd hognews",
delay=00:00:08, xdelay=00:00:07, mailer=prog, stat=Sent
Note the "lost input channel". I don't think that is right. The cycle
above repeats at 17:18:32. This goes on until I kill it.
In the above case, the mailer-deamon is responding to Mailman (the last
line). Mailman then processes that mail, and returns a message like the
attached garbage to root. I just deleted 1400 messages from my root
mailbox.
Some more information:
In logs/error, I see the following traceback repeated every 30 minutes:
Jan 08 11:12:07 1999 smtplib: Traceback (innermost last):
smtplib:   File "/home/mailman/install/cron/run_queue", line 31, in ?
smtplib:      OutgoingQueue.processQueue()
smtplib:   File "/home/mailman/install/Mailman/OutgoingQueue.py", line
38, in processQueue
smtplib:      Utils.TrySMTPDelivery(recip,sender,text,full_fname)
smtplib:   File "/home/mailman/install/Mailman/Utils.py", line 201, in
TrySMTPDelivery
smtplib:      con.send(to=recipient,frm=sender,text=text)
smtplib:   File "/home/mailman/install/Mailman/smtplib.py", line 75, in
send
smtplib:      self.getresp()
smtplib:   File "/home/mailman/install/Mailman/smtplib.py", line 147, in
getresp
smtplib:      raise bad, resp
smtplib: smtplib.error_proto :  550 <bzkeffer>... User unknown
I suspect that the "lost input channel" further above is due to a
similar condition. The traceback drops the socket, fails to log
anything, and fails to clear the confirmation email from the outgoing
queue. Mailman processes the error response from mailer-daemon and drops
it into the queue. It runs the queue which processes that response
(flooding the root email box) along with the confirmation (again), which
starts the loop again.
I also believe that I now understand why my machine died back in August
due to this bug, but that hasn't happened to me recently. Because of the
queueing nature of Mailman, there is only one loop occurring at a time.
HOWEVER: if that cron job wakes up and processes the queue, then it
starts a second, simultaneous loop. If enough of those 30-minute
cron-caused loop-creations occurs, then your machine is completely dead
as it thrashes, trying to process all those loops. I've been catching
them relatively soon in this past spate of them (although I did miss one
last night for a long while, causing my loadavg to hit 15... heh. it's
also a DNS server and its response became SLOWWWWWW... which led me to
look into it)
So, I see one or more errors here:

a possible traceback not being logged during Q processing
said traceback fouling the outgoing mail queue processing
should probably disallow domain-less addresses in all places
bounce detection needs to recognize the mail that caused the attached
response (read between the lines for the content)

And because this can swamp a box, I would highly recommend it gets fixed
:-)
thx
-g
--
Greg Stein, http://www.lyra.org/

shipstopper bug

Greg Stein

Barry A. Warsaw

tags

participants (2)