[Mailman-Users] Postfix/Mailman breaks (somewhat) upon sending out to list of 300+ users
Ted M Harapat
ted-mailman at mob.net
Fri Nov 9 21:04:11 CET 2001
(These emails always get so damned long, but its hard to tell in details whats
going wrong or how to fix anything otherwise. Sorry.)
Well.... my father entered them into the list. And he did it through the web
inteface. I think thats considered manually. I don't know what the 54rd one is
but I didn't look because the list was entered alphabetically and when it sends
out, mailman (and eventually postfix) sent them out in chunks based on domain
name (as all good MTAs should). And those domains were alphabetically all over
the list (since it wasn't sorted by domains).
As for the system. Its brad new (sort of). Its a brand new linux install (of a
current distro released less than 2 months ago) on a old Dual Pentium Pro 200
box with over 75GB of disk (almost all using ReiserFS), only abuot 7GB being
used right now. And over half the disk (especially where the system files
reside at) is UltraWide SCSI 10k rpm. Memory is at 320MB with 256MB swap. The
system runs lots of services but doesn't even come close to running out of CPU,
disk space, I/O, memory, or administrator patience (most of the time).
Here's what else I've come up with. (And I wish more people would talk about
fixes to lists after they post questions and then figure it out and never
report back.) This has to do with the qrunner lock files. It appears that like
so many others, the Mailman tool (or something) messes up something and it
creates a lockfile in /home/mailman/locks that corresponds to a qrunner process
running. And you can't process anymore queue till that's completed.
So you can do one of two things... wait until it finishes which could take
close to forever. I never have waited it out. Or you can kill it manually and
delete the qrunner lock files. Then rerun qrunner. Sometimes it processes the
smaller (non 300+ user) lists and those go out, but the remaining 257 users of
my dad's list are still in those hard to read qfiles. Somehow I know that's
slowing up the system and so those qrunner processes never run and the lock
files never go away.
So I stayed up much later that I should have last night and here's what I did:
I first adjusted the smtpd_recipient_limit variable in /etc/postfix/main.cf
from 100 to 1000. That didn't appear to help. So then I set root's cron to run
every 5 minutes to rerun the Postfix supplied "/etc/rc.d/init.d/postfix
restart" command. This reloads everything and seems to allow that qrunner
process to complete(?) or just die. But seeing as the lock file goes away and
it restarts the next minute with cron calls qrunner, there's a small chance
that it will process mail going to the other smaller lists on my server. (This
is part of the mystery - how does restart postfix release that qrunner file to
send out it's mail finally!?) So at that point I was tired and wanted sleep. So
I went to bed and left root cron doing that restart. And in the morning all
mail going to the small lists (which I'm subscribed to) went out and I received
them all. That was my plan, I was happy. I was hoping someone would reply to my
messages with some magical fix for my dad's list.
Then my dad called me all excited this morning saying that both emails to his
list of 300 went out to the remaining 257 list members (thats 514 emails
total). But he said they didn't go out till nearly 7am. And I hadn't changed
anything since around 1:30am. So it took 5.5 hours for it to process all of
that?! If so, was it that just that I was stopping that stuck qrunner process
every five minutes or was it a combination of that plus the new
smtpd_recipient_limit variable that maade it go?
Oh, and Jon, I did find your shell script (the one about Jeff B and a
misconfigured browser) in a list archinve with the for loop showing how to
delete and kill the qrunner locks and processes. I touched up some of the
syntax for my OS and ls output. But that didn't work for me. It just seemed to
never process anything. Perhaps I was impatient.
So.... I'm still stuck with qrunner locking files even though I do have this
temporary work around. Oh, and the lock files are gone now that my entire list
has proccessed the qfiles. I imagine that I could stop restarting postfix so
Living with a mysterious fix,
Quoting Jon Carnes <jonc at haht.com>:
> Just out of left field, but when you put in the mailing addresses for
> dad's list, how did you do it? Manually, or did you feed them in via
> "add_members"? Check the list and see what the 53rd and 54th address
> then go back to your import list and check out those addresses for
> Might not help, but it's certainly something to look at while you are
> waiting for inspiration.
> Another thought - how much space does your server have available (df)?
> is your memory on your server (top)? Could you be running out of
> Jon Carnes
> ----- Original Message -----
> From: "Ted M Harapat" <ted-mailman at mob.net>
> To: <mailman-users at python.org>
> Sent: Thursday, November 08, 2001 3:17 PM
> Subject: [Mailman-Users] Postfix/Mailman breaks (somewhat) upon sending
> to list of 300+ users
> > Hello all. Any suggestions and ideas on this problem are welcome.
> > First of all, I recently switched to postfix after many years with
> sendmail and
> > qmail. I really like it. So then I got majordomo working with it,
> > suggested I try Mailman and I really love this system. Thank you FSF
> > So it's easy enough to set up with a little looking around. I set it
> and set
> > up 4 lists on it. 3 of them with under 20 people on it, and one (for
> > over 300 people. The smaller lists work perfectly. That is, until my
> > sends out to his list of 300+ people. According to the mail logs,
> > mail to the server is received to the list and it is sent back to my
> father for
> > approval (as I have set it up intentionally). So he approves it, and
> > emails out to exactly 53 of the 300+ members. Then it stops sending
> > errors (I've checked all of postfix's and syslog's logs). Not only
> > everything else with Mailman is then foobarred. Now when any of the
> > lists send to it, postfix records receiving it but then it doesn't do
> > step such as sending it out to the users on the list or going to the
> > approval. It (Mailman) just stops dead cold. No more outgoing traffic
> > explaining why.
> > So, upon examining every log I could think of, I finally just Reload
> > using the included scripts. Then, all of a sudden, everything starts
> > processing. All mail waiting to go to the admins for approval or
> to go
> > to the end users on all the different lists are suddenly sent out as
> > and efficiently as everything normally goes with Mailman. All mail
> > remaining 240+ people on my dad's list. Mail to those listmembers
> > disappeared.
> > So.... I decided to try this again. Same thing. Dad sends mail out to
> > it goes to exactly 53 people and dies again, and makes Postfix go
> > Outside of this combined Postfix/Mailman problem, the mail server acts
> > normal, processing all other traffic. I almost suspect that this is
> > of a Mailman than postfix problem. I think I'm good enough with
> postfix to
> > if its a problem there and it doesn't appear to be. But I can't be
> > it is partially resolved just by my reloading the MTA.
> > Strangely enough, these 53 users are the exact same 53 from the
> > I've checked over mailing lists for the last few months reading a lot
> > (since what an email of this subject would be called) and didn't
> > And nothing on this is in the FAQs or Manuals that I can tell.
> > Anyone have any idea? Anything at all?
> > HELP!
> > -ted
More information about the Mailman-Users