In need of help. Trying to recover from a serious buffer overflow
I'm list admin/moderator of a 450+ member list that ran into trouble yesterday.
First, we're running Mailman 2.1.8 on a current release of OpenBSD. And it's been running for 2+ years with no issues, until this past Friday morning when the user db became corrupt.
Content filtering is set to remove all attachments and graphics, and we'd determined that more than 15 CCs would hold an inbound post to moderate. No message of more than 60 kb will be accepted.
On Wednesday, a member submitted a large email with poor formatting to the list. And it had enough CCs attached to it that it caused a buffer overflow, which resulted in the message being self-approved for the list. Suddenly the server began repeating itself as it tried to process this huge message over and over again. It became a vicious cycle - as the one message finally repeated out, other messages that were caught in the queue became ammunition for the server to repeat. Prior to the system crash, more than 6100 duplicate messages were sent to 450 members, at an estimated peak rate of about 20,000 outbound messages per half hour.
The user db became corrupt, and my server admin and I have worked to get things operational again. At this point, the list is supposedly behaving itself, yet no messages are actually being sent from the server. I don't have access to the raw logs, but here is the snippet of discussion about the restoration of the user database.
Jason: here's what I see on the server.
# pwd
/var/spool/mailman/lists/cyberscots
# file *
config.pck: data
config.pck.last: data
digest.mbox: UTF-8 Unicode mail text
en: setgid directory
pending.pck: ASCII text
request.pck: data
and the restore at home...
/backups/spool/mailman/lists/cyberscots
# file *
config.pck: smtp mail text
config.pck.last: smtp mail text
digest.mbox: ISO-8859 mail text
en: directory
pending.pck: smtp mail text
request.pck: data
the *.pck files from the restore all look like regular mbox files.
but the ones on the server are some sort of binary config file.
I checked against a different list, same thing.
on the server...
/var/spool/mailman/lists/lynches
# file *
config.pck: data
config.pck.last: data
we got a user db restored, at which point the gui became functional again, and we cleared the outbound queue, which had 1400+ messages that were blocked by a provider who'd had enough. And now, although I can send emails to the list, they do not seem to make it all the way through the process:
Feb 29 22:15:10 colo2 amavis[27834]: (27834-18) ESMTP::10024 /var/amavisd/tmp/amavis-20080229T194128-27834: <pmlynch@lbd.org> -> <cyberscots.cluebringer.com@localhost.dixongroup.net> Received: SIZE=3425 from mx1.dixongroup.net ([127.0.0.1]) by localhost (mx1.dixongroup.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 27834-18 for <cyberscots.cluebringer.com@localhost.dixongroup.net>; Fri, 29 Feb 2008 22:15:10 -0500 (EST) Feb 29 22:15:10 colo2 amavis[27834]: (27834-18) Checking: f8-BzF9WBjIQ <pmlynch@lbd.org> -> <cyberscots.cluebringer.com@localhost.dixongroup.net> Feb 29 22:15:11 colo2 amavis[27834]: (27834-18) FWD via SMTP: <pmlynch@lbd.org> -> <cyberscots.cluebringer.com@localhost.dixongroup.net>, 250 2.6.0 Ok, id=27834-18, from MTA([127.0.0.1]:10025): 250 2.0.0 Ok: queued as 2177591EC2 Feb 29 22:15:11 colo2 amavis[27834]: (27834-18) Passed CLEAN, [71.178.250.158] <pmlynch@lbd.org> -> <cyberscots.cluebringer.com@localhost.dixongroup.net>, Message-ID: <6F891DB1-E3FE-4EC1-8945-76CDC41D9236@lbd.org>, mail_id: f8-BzF9WBjIQ, Hits: -, 671 ms Feb 29 22:15:11 colo2 postfix/smtp[11051]: 7D8FD91EC1: to=<cyberscots.cluebringer.com@localhost.dixongroup.net>, orig_to=<Cyberscots@cluebringer.com>, relay=127.0.0.1[127.0.0.1]:10024, delay=0.97, delays=0.28/0/0.01/0.68, dsn=2.6.0, status=sent (250 2.6.0 Ok, id=27834-18, from MTA([127.0.0.1]:10025): 250 2.0.0 Ok: queued as 2177591EC2) Feb 29 22:15:11 colo2 postfix/local[406]: 2177591EC2: to=<cyberscots.cluebringer.com@localhost.dixongroup.net>, relay=local, delay=0.71, delays=0.21/0.1/0/0.4, dsn=2.0.0, status=sent (delivered to command: /usr/local/lib/mailman/mail/mailman post cyberscots) Feb 29 22:16:44 colo2 amavis[28675]: (28675-20) ESMTP::10024 /var/amavisd/tmp/amavis-20080229T193117-28675: <pmlynch@magellanhealth.com> -> <cyberscots.cluebringer.com@localhost.dixongroup.net> Received: SIZE=1846 from mx1.dixongroup.net ([127.0.0.1]) by localhost (mx1.dixongroup.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 28675-20 for <cyberscots.cluebringer.com@localhost.dixongroup.net>; Fri, 29 Feb 2008 22:16:44 -0500 (EST) Feb 29 22:16:45 colo2 amavis[28675]: (28675-20) Checking: efIbYYLyUd73 <pmlynch@magellanhealth.com> -> <cyberscots.cluebringer.com@localhost.dixongroup.net> Feb 29 22:16:46 colo2 amavis[28675]: (28675-20) FWD via SMTP: <pmlynch@magellanhealth.com> -> <cyberscots.cluebringer.com@localhost.dixongroup.net>, 250 2.6.0 Ok, id=28675-20, from MTA([127.0.0.1]:10025): 250 2.0.0 Ok: queued as 09BAE91EC2 Feb 29 22:16:46 colo2 amavis[28675]: (28675-20) Passed CLEAN, [204.193.93.43] <pmlynch@magellanhealth.com> -> <cyberscots.cluebringer.com@localhost.dixongroup.net>, Message-ID: <3F84D5A4D6A79946B4A3A9A00529883B06447C2E@colmdex5.mbh.mhs.magellanhealth.com>, mail_id: efIbYYLyUd73, Hits: -2.417, 1324 ms Feb 29 22:16:46 colo2 postfix/smtp[11051]: DB1A691EC1: to=<cyberscots.cluebringer.com@localhost.dixongroup.net>, orig_to=<cyberscots@cluebringer.com>, relay=127.0.0.1[127.0.0.1]:10024, delay=1.5, delays=0.13/0/0.01/1.3, dsn=2.6.0, status=sent (250 2.6.0 Ok, id=28675-20, from MTA([127.0.0.1]:10025): 250 2.0.0 Ok: queued as 09BAE91EC2) Feb 29 22:16:46 colo2 postfix/local[406]: 09BAE91EC2: to=<cyberscots.cluebringer.com@localhost.dixongroup.net>, relay=local, delay=0.8, delays=0.2/0/0/0.6, dsn=2.0.0, status=sent (delivered to command: /usr/local/lib/mailman/mail/mailman post cyberscots)
In short, it looks as if messages are being received, and they are being delivered to the outbound queue, but nothing is happening beyond that point. Any ideas? Please help!
Thank you, Perry
____________________________________________________________________________________
Looking for last minute shopping deals?
Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping
Perry M Lynch wrote:
On Wednesday, a member submitted a large email with poor formatting to the list. And it had enough CCs attached to it that it caused a buffer overflow, which resulted in the message being self-approved for the list.
Do you have evidence that this is what happened, or is this just conjecture?
Suddenly the server began repeating itself as it tried to process this huge message over and over again. It became a vicious cycle - as the one message finally repeated out, other messages that were caught in the queue became ammunition for the server to repeat. Prior to the system crash, more than 6100 duplicate messages were sent to 450 members, at an estimated peak rate of about 20,000 outbound messages per half hour.
The user db became corrupt, and my server admin and I have worked to get things operational again. At this point, the list is supposedly behaving itself, yet no messages are actually being sent from the server. I don't have access to the raw logs,
Which is unfortunate, because that's what we need to see to determine what happened. I would like to see Mailman's error, post, qrunner, smtp, smtp-failure and vette logs from the time of the original failure to try to see what happened and if there is a Mailman bug of some kind that allowed it. Can you get these logs from the server admin?
but here is the snippet of discussion about the restoration of the user database.
Jason: here's what I see on the server.
# pwd /var/spool/mailman/lists/cyberscots # file * config.pck: data config.pck.last: data digest.mbox: UTF-8 Unicode mail text en: setgid directory pending.pck: ASCII text request.pck: data
and the restore at home...
/backups/spool/mailman/lists/cyberscots # file * config.pck: smtp mail text config.pck.last: smtp mail text digest.mbox: ISO-8859 mail text en: directory pending.pck: smtp mail text request.pck: data
the *.pck files from the restore all look like regular mbox files.
Which is totally wrong, because they should be Python pickles.
but the ones on the server are some sort of binary config file.
Which is what they should be.
I checked against a different list, same thing. on the server... /var/spool/mailman/lists/lynches # file *
config.pck: data config.pck.last: datawe got a user db restored,
What user db? In standard Mailman, the user db, list config, etc. is all part of the list object saved in the config.pck. Do you have a custom MemberAdaptor that uses a different, separate user db? or are you just referring to the config.pck?
at which point the gui became functional again, and we cleared the outbound queue,
Mailman's out queue or the Postfix queue?
which had 1400+ messages that were blocked by a provider who'd had enough. And now, although I can send emails to the list, they do not seem to make it all the way through the process:
Is Mailman running? See <http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq04.078.htp>.
Feb 29 22:15:10 colo2 amavis[27834]: (27834-18) ESMTP::10024 /var/amavisd/tmp/amavis-20080229T194128-27834: <pmlynch@lbd.org> -> <cyberscots.cluebringer.com@localhost.dixongroup.net> Received: SIZE=3425 from mx1.dixongroup.net ([127.0.0.1]) by localhost (mx1.dixongroup.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 27834-18 for <cyberscots.cluebringer.com@localhost.dixongroup.net>; Fri, 29 Feb 2008 22:15:10 -0500 (EST) Feb 29 22:15:10 colo2 amavis[27834]: (27834-18) Checking: f8-BzF9WBjIQ <pmlynch@lbd.org> -> <cyberscots.cluebringer.com@localhost.dixongroup.net> Feb 29 22:15:11 colo2 amavis[27834]: (27834-18) FWD via SMTP: <pmlynch@lbd.org> -> <cyberscots.cluebringer.com@localhost.dixongroup.net>, 250 2.6.0 Ok, id=27834-18, from MTA([127.0.0.1]:10025): 250 2.0.0 Ok: queued as 2177591EC2 Feb 29 22:15:11 colo2 amavis[27834]: (27834-18) Passed CLEAN, [71.178.250.158] <pmlynch@lbd.org> -> <cyberscots.cluebringer.com@localhost.dixongroup.net>, Message-ID: <6F891DB1-E3FE-4EC1-8945-76CDC41D9236@lbd.org>, mail_id: f8-BzF9WBjIQ, Hits: -, 671 ms Feb 29 22:15:11 colo2 postfix/smtp[11051]: 7D8FD91EC1: to=<cyberscots.cluebringer.com@localhost.dixongroup.net>, orig_to=<Cyberscots@cluebringer.com>, relay=127.0.0.1[127.0.0.1]:10024, delay=0.97, delays=0.28/0/0.01/0.68, dsn=2.6.0, status=sent (250 2.6.0 Ok, id=27834-18, from MTA([127.0.0.1]:10025): 250 2.0.0 Ok: queued as 2177591EC2) Feb 29 22:15:11 colo2 postfix/local[406]: 2177591EC2: to=<cyberscots.cluebringer.com@localhost.dixongroup.net>, relay=local, delay=0.71, delays=0.21/0.1/0/0.4, dsn=2.0.0, status=sent (delivered to command: /usr/local/lib/mailman/mail/mailman post cyberscots)
OK. The post was delivered to the Mailman wrapper.
<snip>
In short, it looks as if messages are being received, and they are being delivered to the outbound queue, but nothing is happening beyond that point.
Actually, they are being delivered to Mailman's mail wrapper which terminates normally. This means they have been queued in Mailman's incoming queue. See the FAQ linked above for the next steps (you can skip steps 2 and 3 because this is Postfix, and we know delivery to Mailman is working).
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (2)
-
Mark Sapiro
-
Perry M Lynch