[Mailman-Developers] Re: [Mailman-Users] optimizing mail delivery

Barry A. Warsaw bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Wed, 17 Nov 1999 20:37:58 -0500 (EST)

>>>>> "PT" == Paul Tomblin <ptomblin@xcski.com> writes:

    PT> Wrong.  Sendmail *does* do this for you.  If you have a bunch
    PT> of people on a list in random order, sendmail will work
    PT> through the list in order, but every time it connects to a
    PT> domain, it will send one message to *all* the members of the
    PT> list at that domain.  One of my most popular lists has at
    PT> least 30% of its subscribers at panix.com, and another 10% at
    PT> best.com.  When it was on majordomo, the entire list was
    PT> passed off to sendmail in one chunk.  Since one of the
    PT> panix.com people is first on the list, all the panixians got
    PT> their mail first, then a few more people, then all the people
    PT> at best.com, etc.  Now that I'm using mailman, and it's split
    PT> into 5 chunks, I've noticed that one of the chunks is all
    PT> panixians, which goes in one message to panix, and one of the
    PT> chunks is almost all bestians, which also goes in one message.

So do you think the Mailman way is better or worse?  I'm curious
because I'm trying to decide whether I should port Mailman 1.0's bulk
mailer code to the new message pipeline.

In the current 1.2 code base (available via the anonCVS), I os.popen()
sendmail[1] passing the entire recipient list on the command line,
then I pipe the message text to stdin and let the MTA do the rest.
There's one complication; if the length of the recipients list is
greater than a certain length (current 3000), I chop it up into
multiple popens, but I don't do any sorting of the recipient list.

The advantage I see of this is that sendmail can do it's thing
asychronously, without keeping the list object locked the entire
time.  The disadvantage is that Mailman is only aware of delivery
problems if the delivery bounces.

I could see trying to sort the recips on the domain name if the total
length is > 3000.  I can also see porting the local-smptd delivery
scheme of the current bulk mailer, but fixing some of the problems in
the 1.0 code base (and there are quite a few).  I'm leary though of
stepping on too much of the MTA's toes -- a good MTA should just do
the right thing.

Another alternative, which would be less work and delegates all
delivery to the MTA, is to just pump all the recips to the local smtpd
via smtplib.py.  The advantage here is that again we're MTA
independent, but the disadvantage is that Mailman's delivery is
synchronous with the smtpd.  We'd have to be very sure to unlock the
list object during this transaction (but watching out for race
conditions, locking again if failure status's are handled directly,
etc.)  More code, more opportunity for bugs.



[1] I actually don't use sendmail, I use postfix, but the code is
easily configured to use anything with a sendmail compatible command
line interface.