[Mailman-Users] Problems with multi-machine slicing

Mark Sapiro mark at msapiro.net
Sun May 25 00:56:36 CEST 2014

On 05/24/2014 03:05 PM, Jeff Taylor wrote:
> After stopping mailman, machine #1 shows:
> May 24 15:23:34 2014 (11512) Master qrunner detected subprocess exit
> (pid: 11516, sig: None, sts: 15, class: IncomingRunner, slice: 1/3)
> Machine #2:
> May 24 15:21:56 2014 (12767) Master qrunner detected subprocess exit
> (pid: 12769, sig: None, sts: 15, class: BounceRunner, slice: 2/3)
> Machine #3:
> May 24 15:22:16 2014 (31849) Master qrunner detected subprocess exit
> (pid: 31858, sig: None, sts: 15, class: VirginRunner, slice: 3/3)

OK, that looks good.

> Now for even more strangeness...  After restarting mailman I sent
> another test message.  Just so you know, my test list has three email
> addresses in it, so I would expect the messages to get split up
> generally between the three machines (and please confirm my
> understanding... if the list has three users on it, each one of the
> three machines should forward one message to one user from the list?).

No. That's not the way it works. See below.

> However after restarting and sending 7 more tests, it seems to bounce
> between machine #1 and #2 sending the messages.  In each case, one
> machine sends the message to ALL users.  After waiting about 15 minutes
> I sent several more test messages.  Now it seems to be randomly picking
> one of the three machines to send from, but again the copy to all users
> is sent from that one machine.  I suppose that is better than it was --
> at least now all three machines are being used.  Is this the way its
> supposed to be working?

I think so.

Here's the detail. First the general flow.

1) A post arrives and is queued in the in/ queue.
2) It is picked up by IncomingRunner and processed through the handler
3) Assuming it is not held for any reason, it will get queued in the
archive/ queue for ArchRunner and in the out/ queue for OutgoingRunner.
It will also be added to the list's digest.mbox for eventually being
sent to digest members as part of a digest which will be created and
queued in the virgin/ queue for VirginRunner which will ultimately queue
it in out/ for delivery
4) ArchRunner will pick up the message from the archive/ queue and
archive it.
5) OutgoingRunner will pick up the message from the out/ queue and
deliver it to the recipients.

Before we look at slicing, we see that once OutgoingRunner has a
message, it will deliver it to all it's recipients, so a single post
will always be delivered from the one machine who's OutgoingRunner
picked it up from the out/ queue.

Now for slicing. Whenever a message is queued, whether for the in/ queue
by mail delivery or some other queue by some handler or other process,
it gets a file name of the form tttt+hhhhhhhh.pck. the tttt part is a
time stamp so we can ensure fifo processing. The hhhhhhhh part is a hex
digest of a sha1 hash of the message, the listname and the current time.
Slicing works by dividing that hash space into n equal slices (in your
case 3 with slice 0 being the first third, slice 1 the middle third and
slice 2 the last third).

So when a runner that is processing slice 0 say, looks at its queue, it
will only process those messages in the first third of the hash space.

So bottom line, an incoming message will be queued in the in/ queue and
it has an equal chance of being in any slice and will be picked up by
the machine processing that slice. Then the message will be later
requeued in out/ probably with a different hash. The time is in seconds
and may or may not have changed, but the message has likely changed due
to subject prefixing, content filtering and/or header refolding. So it
will be picked up by the OutgoinRunner processing its slice, and that
one runner will deliver to all recipients.

> Regarding the upgrade version, its been too long, I'm afraid I don't
> know what the old version was.  The old machines are running ubuntu
> oneiric and now have mailman 2.1.14.  The newer machines have debian
> wheezy and mailman 2.1.15.  The upgrades happened a few months back, but
> I only noticed the issue yesterday because I am trying to get rid of the
> ubuntu machines and replace them with the debian machines.  The messages
> have been getting delivered, but apparently one machine was handling
> everything.

I was curious because it would help me know if there had been relevant
changes, but I think it's working as it's supposed to and probably as it
wads before.

Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan

More information about the Mailman-Users mailing list