[Mailman-Users] locks, max cpu, postfix, qrunner ugliness

Morgan Fletcher morgan at hahaha.org
Mon Feb 18 17:46:24 CET 2002

Charlie Watts <cewatts at frontier.net> writes:
>> I 'ktrace -dp pid', waited five minutes, did 'ktrace -c', then 'kdump |
>> less'.
> <snip ktrace output with qrunner busy-waiting>
> Try taking qrunner out of cron, sending a message to the list, and
> starting qrunner under ktrace. Then do the 'kdump | less' and you should
> be able to see where it changes to busy-waiting ...

That was done with no qrunner in crontab at the time. I did get more
interesting data when I launched qrunner with ktrace, but still couldn't
track it down. 

> Is it a lock issue?

After each HUP I'd check for mailman processes, wait for there to be none,
then clean out the locks

> Hrm. I've got 11k users in a list, and know of folks with 100k user lists.

I believe I fixed the problem. I don't know the exact cause. I sent a
message via script to each of the 1800+ subscribers to the majordomo
list. I got back about 200 bounces. I removed all of those addresses from
the mailman list. I grep-ed through the qfiles directory for bounce
messsages, including one really crazy bounce caused by a majordomo list
server address being subscribed. (Majordomo was replying, quoting the
welcome message with "> ", over and over and over. I wonder if python saw
redirects in all the garbage.) I shrank the number of files in qfiles from
over 800 to around 60 human-authored posts. Ran qrunner. It succeeded. I
put the cron job back in place that runs it every minute, and it's been
humming along for about 10 hours with no problem.

This would probably be a good test case for mailman: add some percentage of
known dead addresses to a list and see what mailman does. 

Thanks for your help, it seems to be working now.

