Hello,
I have had a big problem with Mailman yesterday and today: two or three hours after I sent out my "big list" message (150 000 subscribers), Mailman started to spend all its time using CPU, for ages. In fact, it was the bounce runner that had started to send out the "probe" messages, one by one, making VirginRunner lock the list repeatedly, and holding the lock for 15s each time. Meanwhile the web admin interface was unreachable (for that list), and other scripts waited and failed.
I finally looked everywhere to find out what was happening, and cleaned up the qfiles/virgin/ queue of all the probes, and waited another two hours for the qflies/command/ queue to empty... not very good.
As I have a very big server (4 CPUs, 6Gbytes of RAM), I suppose that only a MySQL-backed Mailman would answer this problem. How far is it from being usable?
-- Fil
Hi,
I have had a big problem with Mailman yesterday and today: two or three hours after I sent out my "big list" message (150 000 subscribers), Mailman started to spend all its time using CPU, for ages. In fact, it was the bounce runner that had started to send out the "probe" messages, one by one, making VirginRunner lock the list repeatedly, and holding the lock for 15s each time. Meanwhile the web admin interface was unreachable (for that list), and other scripts waited and failed.
You may want to disable VERP_PROBES by update your mailman to the latest CVS. It was introduced in 2.1.5 but made optional for backward compatibility. I remember you have updated around 16 Oct but CVS commit was done 22 for this fix. (Remember you need specify Release_2_1-maint)
-- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/
You may want to disable VERP_PROBES by update your mailman to the latest CVS. It was introduced in 2.1.5 but made optional for backward compatibility. I remember you have updated around 16 Oct but CVS commit was done 22 for this fix. (Remember you need specify Release_2_1-maint)
In fact I would like to disable probes altogether, on all lists. Is this with a cron job, or with the bounce runner ?
-- Fil
You may want to disable VERP_PROBES by update your mailman to the latest CVS. It was introduced in 2.1.5 but made optional for backward compatibility. I remember you have updated around 16 Oct but CVS commit was done 22 for this fix. (Remember you need specify Release_2_1-maint)
In fact I would like to disable probes altogether, on all lists. Is this with a cron job, or with the bounce runner ?
In fact mailman (CVS) will not send probes if VERP_PROBES = No. :-)
-- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/
In fact I would like to disable probes altogether, on all lists. Is this with a cron job, or with the bounce runner ?
In fact mailman (CVS) will not send probes if VERP_PROBES = No. :-)
What's weird now is that it processes things just "as if" it was sending probes, but indeed, in fact it doesn't send them. That's not solving my problem of CPU/lock :-)
logs/bounce is full of: Oct 29 09:41:59 2004 (21355) renxxxxxx@xxxxxx.fr: info-diplo current bounce score: 3.0 Oct 29 09:41:59 2004 (21355) sending info-diplo list probe to: renxxxxxx@xxxxxx.fr (score 3.0 >= 3.0)
-- Fil
Fil wrote:
In fact I would like to disable probes altogether, on all lists. Is this with a cron job, or with the bounce runner ?
In fact mailman (CVS) will not send probes if VERP_PROBES = No. :-)
What's weird now is that it processes things just "as if" it was sending probes, but indeed, in fact it doesn't send them. That's not solving my problem of CPU/lock :-)
logs/bounce is full of: Oct 29 09:41:59 2004 (21355) renxxxxxx@xxxxxx.fr: info-diplo current bounce score: 3.0 Oct 29 09:41:59 2004 (21355) sending info-diplo list probe to: renxxxxxx@xxxxxx.fr (score 3.0 >= 3.0)
Are you sure you restart mailman qrunners ? "list probe" no longer appears in my log. Instead, like this:
Oct 28 10:42:06 2004 (55889) seppyo-talk: xxxx@xxxx disabling due to bounce score 5.0 >= 5.0
-- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/
In fact mailman (CVS) will not send probes if VERP_PROBES = No. :-)
What's weird now is that it processes things just "as if" it was sending probes, but indeed, in fact it doesn't send them. That's not solving my problem of CPU/lock :-)
logs/bounce is full of: Oct 29 09:41:59 2004 (21355) renxxxxxx@xxxxxx.fr: info-diplo current bounce score: 3.0 Oct 29 09:41:59 2004 (21355) sending info-diplo list probe to: renxxxxxx@xxxxxx.fr (score 3.0 >= 3.0)
Are you sure you restart mailman qrunners ?
Yes. But I hadn't updated the CVS :-) Sorry for the trouble
-- Fil
At 3:12 PM +0200 2004-10-28, Fil wrote:
As I have a very big server (4 CPUs, 6Gbytes of RAM), I suppose that only a MySQL-backed Mailman would answer this problem. How far is it from being usable?
For a large-scale mail system, number and speed of CPUs is
meaningless. Amount of RAM doesn't really mean a whole lot. What you need is disk I/O capacity.
Have you put your entire spool filesystem on solid-state disk, or
at least very high capacity multi-spindle RAID 1+0 arrays with high speed battery-backed controllers with large quantities of write-back cache?
See also
<http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq06.003.htp>, as one of many items in the Mailman FAQ that discuss "performance". Of course, you should also see the others, too.
-- Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
Have you put your entire spool filesystem on solid-state disk, or at least very high capacity multi-spindle RAID 1+0 arrays with high speed battery-backed controllers with large quantities of write-back cache?
I don't understand this language, sorry. We have just upgraded the I/O capacity of the server, and it's working fine with every other piece of software, no problem with postfix handling tons of emails for example.
But still when we're talking about a 15-60 seconds lock for each subscriber, even if we could have the hardware go twice faster, it's way too slow. That's why I believe only the database backend can help (or any other software modification). What's especially wierd is that probe sending is very costly, but personalization or VERP-delivery is not that costly.
-- Fil
At 4:37 PM +0200 2004-10-28, Fil wrote:
I don't understand this language, sorry. We have just upgraded the I/O capacity of the server, and it's working fine with every other piece of software, no problem with postfix handling tons of emails for example.
The same sorts of problems generally face Mailman as face other
programs, and the same sorts of solutions are usually useful for Mailman. So, solid-state disks, high-capacity RAID 1+0 arrays with high-speed controllers and large amounts of battery-backed write-back cache, etc....
If you're confused about a lot of these terms, I'd recommend
reading the book by Nick Christenson entitled _sendmail Performance Tuning_, which is linked from that first page I mentioned.
But still when we're talking about a 15-60 seconds lock for each subscriber, even if we could have the hardware go twice faster, it's way too slow.
I think it would be profitable to find out why VirginRunner is
holding the lock this long.
That's why I believe only the database backend can help (or any other software modification). What's especially wierd is that probe sending is very costly, but personalization or VERP-delivery is not that costly.
There are people who have done database back-end stuff for
Mailman 2.x, but so far as I know none of that has been integrated back into the mainstream code base. Those are all one-off operations. There will be a database back-end built into Mailman3, but it's still in the design phase right now.
If you can't find out why VirginRunner is holding the locks so
long, then the best thing to do would be turn off the VERP bounce probes.
-- Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
participants (3)
-
Brad Knowles
-
Fil
-
Tokio Kikuchi