Manipulate mailman in / out queue (from mailman-users)
[Redirecting part of this discussion from mailman-users to mailman-developers. Mark does a great job of explaining the gory details of the queue runners.]
On Oct 16, 2012, at 09:23 PM, Mark Sapiro wrote:
First of all, The actual physical size of the queue directory impacts processing. Every time an entry is added to the queue, and every time a .pck file is renamed to .bak, the entire physical directory must be searched to ensure this isn't a duplicate name. Depending on OS settings, cache sizes and the physical directory size, this may actually involve multiple disk reads each time. Thus, if the qfiles/out/ directory has grown large because 8000+ messages were added to the queue when the runner couldn't handle them (and there may have been more in the retry/ queue because of SMTP failures), it would benefit from shrinking. This is accomplished by moving (mv) or renaming the queue directory itself aside, not just its contents and then letting the runner recreate it when it starts. Then, if necessary, move messages back a few at a time so the directory doesn't grow large again.
I'd like to begin to explore ways to make this automatically more manageable, so that when problems occur (e.g. upstream SMTPd not responding or slow), Mailman can recognize and response more efficiently to the problem.
The first question to ask is whether this mostly affects the outgoing queue, or whether other queues *in practice* can suffer the same overloading? I suspect that the incoming queue could get filled quickly if Mailman is slow to handling an onslaught of new messages. My sense is that in general, it's the outgoing queue that gives people the most problems though.
The switchboard in Mailman 3 is largely the same as in Mailman 2 (modulo updating the code to more modern Pythons).
Before describing my own thoughts, I'd like to open it up and get a sense of your experiences, and strategies you think might help manage big queues.
Cheers, -Barry
Hello Barry,
I have seen both in and out queue get overloaded in our environment. We even have a Nagios monitoring plugin to monitor mailman's in and out queue size and sends alerts when it reaches a threshold.
If we get mailbomb - for example, lists get hijacked and spammed, lists receive automated run away alerts (sent by programs that do not have alert interval). Under those circumstances, a few thousand messages can be pumped into 'in' queue in very short time.
For situation like this, I'd find the message pattern, stop mailman service, use a shell script to scan the 'in' queue and remove these messages. Sometimes we also need to block the sending machine on incoming MTA to stop the messages coming to mailman.
It would be helpful to have a utility tool to get status of the in queue - what's the top sender for example and a tool to remove them.
The outgoing queue gets piled up when outgoing MTA has trouble to accept messages. In these cases, nothing to remove from the queue. Manually manipulating out queue's size becomes necessary. I think tools or automation to help quickly and reliably address the problem before it advances to a service outage would be very important improvement!
Xueshan
On Wed, Oct 17, 2012 at 7:37 AM, Barry Warsaw barry@list.org wrote:
[Redirecting part of this discussion from mailman-users to mailman-developers. Mark does a great job of explaining the gory details of the queue runners.]
On Oct 16, 2012, at 09:23 PM, Mark Sapiro wrote:
First of all, The actual physical size of the queue directory impacts processing. Every time an entry is added to the queue, and every time a .pck file is renamed to .bak, the entire physical directory must be searched to ensure this isn't a duplicate name. Depending on OS settings, cache sizes and the physical directory size, this may actually involve multiple disk reads each time. Thus, if the qfiles/out/ directory has grown large because 8000+ messages were added to the queue when the runner couldn't handle them (and there may have been more in the retry/ queue because of SMTP failures), it would benefit from shrinking. This is accomplished by moving (mv) or renaming the queue directory itself aside, not just its contents and then letting the runner recreate it when it starts. Then, if necessary, move messages back a few at a time so the directory doesn't grow large again.
I'd like to begin to explore ways to make this automatically more manageable, so that when problems occur (e.g. upstream SMTPd not responding or slow), Mailman can recognize and response more efficiently to the problem.
The first question to ask is whether this mostly affects the outgoing queue, or whether other queues *in practice* can suffer the same overloading? I suspect that the incoming queue could get filled quickly if Mailman is slow to handling an onslaught of new messages. My sense is that in general, it's the outgoing queue that gives people the most problems though.
The switchboard in Mailman 3 is largely the same as in Mailman 2 (modulo updating the code to more modern Pythons).
Before describing my own thoughts, I'd like to open it up and get a sense of your experiences, and strategies you think might help manage big queues.
Cheers, -Barry
Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/sfeng%40stanford.e...
Security Policy: http://wiki.list.org/x/QIA9
-- Xueshan Feng Infrastructure Delivery Group, IT Services Stanford University
участники (2)
-
Barry Warsaw
-
Xueshan Feng