Re: Slow Performance on semi-large lists
I shifted this to mailman-developers because I want to talk about changes in qrunner that D.J. Atkinson brought up.
On 2000.12.13, in <Pine.SOL.4.05.10012131455360.22660-100000@babu.pcisys.net>, "D.J. Atkinson" <dj@pcisys.net> wrote:
I posted a message over the weekend where I saw qrunner only processing part of the queue. It turned out that there were three messages in the queue with 3 unresolvable names each. (3 messages to the same list) Each of these queued files took 400 seconds to time out, by which time, we were past the default max qrunner process length (15 minutes), and qrunner exited.
I've of course now increased the process length to 30 minutes, and everything seems to be OK. But that's only temporary, I'm sure. As list volume builds, it will become a problem again. It would be great if there were a more graceful way of dealing with this than currently exists.
How about altering qrunner's algorithm to split the queue on timeout, appending the head of the queue to the tail?
A - fails B - succeeds C - fails D - fails/unprocessed; qrunner times out E - unprocessed F - unprocessed
With this change, your next queue runner will process this queue:
E F A C D
Eventually (ahem) the queue will contain only those batches which are hard to deliver, and they'll be re-ordered with each run to give equal attempts over time.
Actually, that's not true if the queue is reduced to containing only A, C, and D, and qrunner always times out on D; D will never get the same time as A and C. Leaving D at the head of the queue (that is, splitting the queue ahead of the current batch, rather than behind it) solves that problem until the case occurs in which D contains enough bad or slow addresses to stop the queue even though it's first. Two solutions to this: 1) never stop qrunner during the first queued batch (always wait for it to exit); or 2) split the queue ahead or behind of the current batch randomly.
Does this seem to anyone else to solve the problem? I haven't looked at the code yet, so this is just cursory thought.
-- -D. dgc@uchicago.edu NSIT University of Chicago
Thanks David,
From what I've seen on how Mailman's qrunner works, this would help my situation tremendously.
As long as this is going to the developers list, what do you all think of the possibility of adding the "filebase" to the log line of the smtp-failure log and/or the smtp log? I know this would increase the size of the logs, so maybe it would be an option/flag set in the Defaults.py/mm_cfg.py files? This would have been very helpful in tracking down those files that were sucking all the time out of my qrunner jobs.
Regards,
DJ
On Wed, 13 Dec 2000, David Champion wrote:
I shifted this to mailman-developers because I want to talk about changes in qrunner that D.J. Atkinson brought up.
On 2000.12.13, in <Pine.SOL.4.05.10012131455360.22660-100000@babu.pcisys.net>, "D.J. Atkinson" <dj@pcisys.net> wrote:
I posted a message over the weekend where I saw qrunner only processing part of the queue. It turned out that there were three messages in the queue with 3 unresolvable names each. (3 messages to the same list) Each of these queued files took 400 seconds to time out, by which time, we were past the default max qrunner process length (15 minutes), and qrunner exited.
I've of course now increased the process length to 30 minutes, and everything seems to be OK. But that's only temporary, I'm sure. As list volume builds, it will become a problem again. It would be great if there were a more graceful way of dealing with this than currently exists.
How about altering qrunner's algorithm to split the queue on timeout, appending the head of the queue to the tail?
A - fails B - succeeds C - fails D - fails/unprocessed; qrunner times out E - unprocessed F - unprocessed
With this change, your next queue runner will process this queue:
E F A C D
Eventually (ahem) the queue will contain only those batches which are hard to deliver, and they'll be re-ordered with each run to give equal attempts over time.
Actually, that's not true if the queue is reduced to containing only A, C, and D, and qrunner always times out on D; D will never get the same time as A and C. Leaving D at the head of the queue (that is, splitting the queue ahead of the current batch, rather than behind it) solves that problem until the case occurs in which D contains enough bad or slow addresses to stop the queue even though it's first. Two solutions to this: 1) never stop qrunner during the first queued batch (always wait for it to exit); or 2) split the queue ahead or behind of the current batch randomly.
Does this seem to anyone else to solve the problem? I haven't looked at the code yet, so this is just cursory thought.
-- -D. dgc@uchicago.edu NSIT University of Chicago
-- o o o o o o o . . . _______ o _____ _____ ____________________ ____] D D [_||___ ._][__n__n___|DD[ [ \_____ | D.J. Atkinson | | dj@pcisys.net |
(____________|__|_[___________]_|__________________|_|_______________| _/oo OOOO OOOO oo
'ooooo ooooo
'o!o o!o'o!o o!o
-+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+- Visit my web page at http://www.pcisys.net/~dj
"DJA" == D J Atkinson <dj@pcisys.net> writes:
DJA> As long as this is going to the developers list, what do you
DJA> all think of the possibility of adding the "filebase" to the
DJA> log line of the smtp-failure log and/or the smtp log? I know
DJA> this would increase the size of the logs, so maybe it would
DJA> be an option/flag set in the Defaults.py/mm_cfg.py files?
DJA> This would have been very helpful in tracking down those
DJA> files that were sucking all the time out of my qrunner jobs.
Great idea!
One more thing...
Actually, that's not true if the queue is reduced to containing only A, C, and D, and qrunner always times out on D; D will never get the same time as A and C. Leaving D at the head of the queue (that is, splitting the queue ahead of the current batch, rather than behind it) solves that problem until the case occurs in which D contains enough bad or slow addresses to stop the queue even though it's first. Two solutions to this: 1) never stop qrunner during the first queued batch (always wait for it to exit); or 2) split the queue ahead or behind of the current batch randomly.
I'm obviously not the exepert, but my observations indicate that qrunner does complete the current message batch before checking to see if it's exceeded the "QRUNNER_PROCESS_LIFETIME" value, so you could always set it to the next message in the queue.
-- o o o o o o o . . . _______ o _____ _____ ____________________ ____] D D [_||___ ._][__n__n___|DD[ [ \_____ | D.J. Atkinson | | dj@pcisys.net |
(____________|__|_[___________]_|__________________|_|_______________| _/oo OOOO OOOO oo
'ooooo ooooo
'o!o o!o'o!o o!o
-+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+- Visit my web page at http://www.pcisys.net/~dj
"DJA" == D J Atkinson <dj@pcisys.net> writes:
DJA> I'm obviously not the exepert, but my observations indicate
DJA> that qrunner does complete the current message batch before
DJA> checking to see if it's exceeded the
DJA> "QRUNNER_PROCESS_LIFETIME" value, so you could always set it
DJA> to the next message in the queue.
Actually, it doesn't. It checks QRUNNER_PROCESS_LIFETIME before processing every file in the directory listing.
-Barry
participants (3)
-
barry@digicool.com
-
D.J. Atkinson
-
David Champion