[Mailman-Users] Manipulate mailman in / out queue

Xueshan Feng sfeng at stanford.edu
Sat Oct 20 07:35:13 CEST 2012

Hi Mark,

Thank you so much for taking the time to explain the details of queue
process, the meaning of the  file names  in the queue, and what's the right
way to handle files in the queue (like mv vs. cp).

I have been  managing campus mailing  service for a long time. Once in a
while we got mailbomb and messages would got stuck in queue and took long
time to drain. I have tired moves files, move dirs, with / without restart
service. Sometimes the queue would get processed quickly after my
intervene, sometimes it did not. I also worry about losing messages by
manually messing with the queue files.

It helps a lot if one understands how things work. No more trial and error
next time if I have to handle backlogs! Thank you!



On Tue, Oct 16, 2012 at 9:23 PM, Mark Sapiro <mark at msapiro.net> wrote:

> Xueshan Feng wrote:
> >
> >On Mon, Oct 15, 2012 at 9:35 PM, Mark Sapiro <mark at msapiro.net> wrote:
> >
> >> This is really more involved than I can explain without a keyboard
> which I
> >> won't have before Tues eve, but there should be only one .bak file or
> one
> >> per slice if the runner is sliced. This is the message currently being
> >> processed. All others are ignored by the current runner (they will be
> >> "recovered" if the runner is restarted).
> >>
> >
> >This helps a lot already. We do have multiple runners.
> Here are the gory details. All the heavy lifting is done by methods of
> the Switchboard class defined in Mailman/Queue/Switchbord.py.
> Any particular runner is specific to a particular queue or slice of a
> queue. The out/ queue is processed by OutgoingRunner. If it isn't
> sliced, it processes the whole queue. If it is sliced, there are N
> slices.
> Note: The filename of a queue entry consists of a time stamp, a '+', a
> 40 hex digit hash and the extension (.pck or .bak). A slice consists
> of (1/N)th of the hash space. E.g., if N = 4, slice 0 is all hashes
> with first hex digit = 0, 1, 2 or 3; slice 1 is all hashes with first
> hex digit = 4, 5, 6 or 7; slice 2 is all hashes with first hex digit =
> 8, 9, A or B, and slice 3 is all hashes with first hex digit = C, D, E
> or F.
> A particular slice of OutgoingRunner initializes its Switchboard
> instance once at startup or restart. This creates the queue directory
> (qfiles/out/, or whatever queue this runner processes) if necessary,
> sets the upper and lower hash bounds for its slice if sliced and
> normally, recovers all the .bak files in it's slice. Recovery consists
> of incrementing a recovery count in the entry's metadata and renaming
> it from *.bak to *.pck. Thus, immediately after (re)starting a runner,
> there will be no *.bak files in its slice. The counter is to stop
> loops where messages crash the runner. A .bak file will be recovered
> at most 3 times and then moved to qfiles/bad/*.psv.
> After initialization, a runner first obtains a list of all the .pck
> files in its slice, sorted by timestamp so the list is FIFO. It then
> processes the list until the list is exhausted, sleeps for a second
> and gets a new list and repeats the process. If the new list is empty,
> it just sleeps a second and tries again until it gets one or more
> entries to process.
> Processing consists of renaming the file from *.pck to *.bak,
> unpickling it and processing it. If it crashes in processing, it will
> recover the .bak file upon restart. Thus, there should never be more
> than one .bak file per slice.
> >> Note that part of the slowness at this point is due to the size of the
> out
> >> directory.
> >
> >
> >I was able to flush the queue today by moving long lasting *.bak out of
> the
> >way, and at the same time stopped Postfix to allow mailman to process its
> >queue. It took about half an hour to process 8000+ messages. If no manual
> >intervene, it may take a few hours.
> >
> >You can address this by stopping Mailman, moving qfiles/out aside,
> starting
> >> Mailman (which should recreate qfiles/out at the first message if not
> >> before) and then moving old entries back a few at a time.
> >>
> >
> >I think I've done that before. So moving back files into the queue in
> >batches, doesn't have to stop mailman?
> First of all, The actual physical size of the queue directory impacts
> processing. Every time an entry is added to the queue, and every time
> a .pck file is renamed to .bak, the entire physical directory must be
> searched to ensure this isn't a duplicate name. Depending on OS
> settings, cache sizes and the physical directory size, this may
> actually involve multiple disk reads each time. Thus, if the
> qfiles/out/ directory has grown large because 8000+ messages were
> added to the queue when the runner couldn't handle them (and there may
> have been more in the retry/ queue because of SMTP failures), it would
> benefit from shrinking. This is accomplished by moving (mv) or
> renaming the queue directory itself aside, not just its contents and
> then letting the runner recreate it when it starts. Then, if
> necessary, move messages back a few at a time so the directory doesn't
> grow large again.
> >The real operational question here is each time if we have to stop / start
> >mailman to move files,  than for large volume queues, it would take a lot
> >of manual process. The procedure I have used is:
> >
> >- stop mailman
> >- move queue files or .bak file aside
>    Move the whole directory, not the contents.
> >- start mailman
> >- move some files back, or .bak back into the queue
> >(note  files are moved back while mailman is running)
> Moving (mv or rename) files back from the same file system while
> Mailman is running is fine. When the entry appears in the directory in
> this case, the file contents are complete. This is essentially what
> Mailman does when it makes a queue entry. Copying (cp) is not good
> because there can be a directory entry for the file before its
> contents are complete, and a runner could read an incomplete file.
> --
> Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
> San Francisco Bay Area, California    better use your sense - B. Dylan

Xueshan Feng
Infrastructure Delivery Group, IT Services
Stanford University

More information about the Mailman-Users mailing list