Mailman 3 Re: [Mailman-Developers] [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman - Mailman-Developers

newer
Debugging mailman (resolving...

Re: [Mailman-Developers] [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman

older
Re: [Mailman-Developers] add new...

Tokio Kikuchi

Sept. 27, 2006

2:35 a.m.

bwarsaw@users.sourceforge.net wrote:

...

Revision: 8041 http://svn.sourceforge.net/mailman/?rev=8041&view=rev Author: bwarsaw Date: 2006-09-25 00:53:58 -0700 (Mon, 25 Sep 2006)

Log Message: ----------- Another milestone: you can now post to lists. Converted the following to use the new configuration object: admin, admindb, bounces, confirm, inject, join, leave, owner, post, request, unshunt, version.

Also change MailList.GetScriptURL() to return the list's fully qualified name in links.

I've tested a bit. Postfix.py: got duplicate warning while creating virtual-mailman. Shouldn't we include hostname in aliases? Otherwise, true virtual hosting breaks. MailList.py: got error when name is None. Looks like full_path is hostname@listname order. Is this name + '@' + self.host_name? I needed this patch to create new lists: mailman@colinux:~/src/svn.trunk$ svn diff Mailman Index: Mailman/MTA/Postfix.py =================================================================== --- Mailman/MTA/Postfix.py (revision 8041) +++ Mailman/MTA/Postfix.py (working copy) @@ -142,8 +142,9 @@ print >> fp, '# CREATED:', time.ctime(time.time()) # Now add all the standard alias entries for k, v in makealiases(mlist): + fqdnaddr = '%s@%s' % (k, hostname) # Format the text file nicely - print >> fp, mlist.fqdn_listname, ((fieldsz - len(k)) * ' '), k + print >> fp, fqdnaddr, ((fieldsz - len(k)) * ' '), k # Finish the text file stanza print >> fp, '# STANZA END:', listname print >> fp Index: Mailman/MailList.py =================================================================== --- Mailman/MailList.py (revision 8041) +++ Mailman/MailList.py (working copy) @@ -287,14 +287,17 @@ os.path.join(config.LOCK_DIR, name or '<site>') + '.lock', lifetime=config.LIST_LOCK_LIFETIME) # XXX FIXME Sometimes name is fully qualified, sometimes it's not. - if '@' in name: - self._internal_name, self.host_name = name.split('@', 1) - self._full_path = os.path.join(config.LIST_DATA_DIR, name) - else: - self._internal_name = name - self.host_name = config.DEFAULT_EMAIL_HOST - self._full_path = os.path.join(config.LIST_DATA_DIR, - self.host_name + '@' + name) + if name: + if '@' in name: + self._internal_name, self.host_name = name.split('@', 1) + self._full_path = os.path.join(config.LIST_DATA_DIR, name) + else: + self._internal_name = name + self.host_name = config.DEFAULT_EMAIL_HOST + self._full_path = os.path.join(config.LIST_DATA_DIR, + self.host_name + '@' + name) + else: + self._full_path = '' # Only one level of mixin inheritance allowed for baseclass in self.__class__.__bases__: if hasattr(baseclass, 'InitTempVars'): -- Tokio Kikuchi, tkikuchi@is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/

Show replies by date

Bob Puff＠NLE

September 2006

3:01 a.m.

New subject: [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman

Tokio Kikuchi wrote:

...

Postfix.py: got duplicate warning while creating virtual-mailman. Shouldn't we include hostname in aliases? Otherwise, true virtual hosting breaks.

the virtual file should indeed contain the hostname. The aliases file should not. These are two separate files, both of which are necessary.

Example:

data/virtual-mailman: mailman@nle.com mailman mailman-admin@nle.com mailman-admin ...etc...

data/aliases: mailman: "|/home/mailman/mail/mailman post mailman" mailman-admin: "|/home/mailman/mail/mailman admin mailman" ...etc...

Both should be created, and both should be deleted when the list is removed. Currently, this is not the case.

Bob

Dale Newfield

4:51 a.m.

New subject: [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman

Bob Puff@NLE wrote:

...

the virtual file should indeed contain the hostname. The aliases file should not. These are two separate files, both of which are necessary.

I mostly agree with you, but your solution won't allow true virtual hosting (having test@domain1.com and test@domain2.com be separate lists running on the same machine/mailman instance).

Maybe something like this modified example?

data/virtual-mailman:

...

mailman@nle.com mailman_at_nle_com mailman-admin@nle.com mailman-admin_at_nle_com ...etc...

data/aliases:

...

mailman_at_nle_com: "|/home/mailman/mail/mailman post mailman@nle.com mailman-admin_at_nle_com: "|/home/mailman/mail/mailman admin mailman@nle.com ...etc...

This of course begs the questions of how mailman distinguishes between the lists (what's the appropriate argument to the mailman binary, and whether there are any characters besides "." that are disallowed in domain names (what is a good encoding to use as a valid local address)?

-Dale

Barry Warsaw

7:04 p.m.

New subject: [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Sep 27, 2006, at 12:51 AM, Dale Newfield wrote:

...

This of course begs the questions of how mailman distinguishes between the lists (what's the appropriate argument to the mailman binary, and whether there are any characters besides "." that are disallowed in domain names (what is a good encoding to use as a valid local
address)?

I've just read the Postfix documentation again to see if there are
any new features since the last time we implemented integration with
Postfix virtual domains. Here's the man page:

http://www.postfix.org/VIRTUAL_README.html

What looks interesting is the section entitled:

Postfix virtual MAILBOX example: separate domains, non-UNIX accounts

This appears to allow us to set up true virtual domains without
having to encode destination aliases. The trick though is that we
would use Maildir delivery for all incoming messages, something I'm
keen on switching to for Mailman 2.2 anyway. Maildir is way more
efficient than invoking a mail program per incoming message, Mailman
already supports Maildir (although it isn't the default), and AFAIK
all major MTAs support Maildir.

I'd like to know what you think about updating our Postfix virtual
delivery hooks to use this technique, and about making Maildir
delivery the default. We'd keep the old way around for MTAs that
don't support Maildir (which would be...?) but we'd deprecate that
delivery mechanism.

One thing I'm not sure of is the minimum version of Postfix this
would tie us to. On OS X 10.4 it looks like I've got Postfix 2.1.5
and according to postconf, it supports the necessary variables.

Thoughts?

-Barry

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRRrLK3EjvBPtnXfVAQK4sAQAkkFgWXxxdk1TxLVYxbb3CEO2y59XJtb5 dRcCT8qg0LyxTkFNGEqAQTi9iRIs2/RVuK1KUBVXpSpQbt7zrGfs0s/ukfPVlVbO /h45S+cQ8d9cF2s77PHSqiUgdGckMy7W6v15qJ9zrrMC+M38yqV5oAzpxNc660y8 UtXNleCdeGM= =dgKK -----END PGP SIGNATURE-----

Carson Gaspar

1:33 a.m.

New subject: [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman

--On Wednesday, September 27, 2006 3:04 PM -0400 Barry Warsaw <barry@python.org> wrote:

...

I'd like to know what you think about updating our Postfix virtual delivery hooks to use this technique, and about making Maildir delivery the default. We'd keep the old way around for MTAs that don't support Maildir (which would be...?) but we'd deprecate that delivery mechanism.

One thing I'm not sure of is the minimum version of Postfix this would tie us to. On OS X 10.4 it looks like I've got Postfix 2.1.5 and according to postconf, it supports the necessary variables.

I love the idea. A fork/exec per message always makes me twitch... I have a feeling it would also provide better fault-tolerance, especially in a replicated filesystem cluster, where you have clear atomic behaviour at your disposal.

-- Carson

Brad Knowles

2:36 a.m.

New subject: [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman

At 6:33 PM -0700 9/27/06, Carson Gaspar wrote:

...

I love the idea. A fork/exec per message always makes me twitch... I have a feeling it would also provide better fault-tolerance, especially in a replicated filesystem cluster, where you have clear atomic behaviour at your disposal.

I agree that fork()/exec() is not an ideal model here, but then postfix doesn't use that model internally -- it uses a single parent with multiple child processes, and then hands off sockets. It also keeps pretty much the entire working queue in memory, as opposed to single-threading through the filesystem.

I don't see how using Maildir is going to solve any of these problems. IMO, if we're going to learn from postfix, I think we should learn the right things and take away the right lessons, and not just glom onto some alternative technique that has been known to have a whole host of other problems.

-- Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."

 -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
 Assembly to the Governor, November 11, 1755

Founding Individual Sponsor of LOPSA. See <http://www.lopsa.org/>.

Barry Warsaw

3:54 a.m.

New subject: [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Sep 27, 2006, at 10:36 PM, Brad Knowles wrote:

...

At 6:33 PM -0700 9/27/06, Carson Gaspar wrote:

...
I love the idea. A fork/exec per message always makes me
twitch... I have a feeling it would also provide better fault-tolerance, especially
in a replicated filesystem cluster, where you have clear atomic
behaviour at your disposal.

I agree that fork()/exec() is not an ideal model here, but then postfix doesn't use that model internally -- it uses a single parent with multiple child processes, and then hands off sockets. It also keeps pretty much the entire working queue in memory, as opposed to single-threading through the filesystem.

I don't see how using Maildir is going to solve any of these problems. IMO, if we're going to learn from postfix, I think we should learn the right things and take away the right lessons, and not just glom onto some alternative technique that has been known to have a whole host of other problems.

Remember that we're only talking about how to most efficiently get
mail from the MTA via local delivery into the Mailman incoming
queue. Once in Mailman we'll handle the message our own way, not
with maildir.

This means that we're limited by what the various MTAs we want to
integrate with support. Until now, we've always gone with delivery
to a program, because that's supported by all MTAs. This is what
Carson was referring to by fork/exec -- the MTA must fork/exec
Mailman's mail wrapper (i.e. post) which basically just sucks the
message text from stdin and write it to a file. It seems clear that
if you could eliminate that extra process (essentially a glorified
cat), you'd get a win. I think you're going to get about the same
amount of filesystem thrashing in either case, so you might as well
avoid the extra process overhead.

Looking at Postfix, what other options are readily available? I
suppose you could try to hook into the transport maps, but if I
understand them correctly, you're still talking about forking a
process per message. Maybe LMTP to a daemon process is another
option, but there appears to be no documentation on www.postfix.org
about LMTP.

As for what's best for other MTAs, that's a good question. I think
we'll always have to support delivery-to-program since that seems
like it's the lowest common method. If there are delivery mechanisms
that make more sense for specific MTAs, then I'm all for including
them, but others who are more familiar with those mailer servers will
have to help (read: donate code).

-Barry

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRRtHdnEjvBPtnXfVAQIVKQP+PBYbnBeG7M7MySaZdGyUy74d/cRwModC p19xJNMDsSvKKR4dy+uWzyzTF9uM3xPSKQUokvMWQHudyMmtf980E6db4THF1/do i87z/B3gLS+7SwyXYeRPqnMlEjuHkmFGjPS4t9jz9sGaraixVE9qMqJ3F8S+n7hi 6DuAdUtRqOw= =s19m -----END PGP SIGNATURE-----

Carson Gaspar

4:15 a.m.

New subject: [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman

--On Wednesday, September 27, 2006 11:54 PM -0400 Barry Warsaw <barry@python.org> wrote:

...

Looking at Postfix, what other options are readily available? I suppose you could try to hook into the transport maps, but if I understand them correctly, you're still talking about forking a process per message. Maybe LMTP to a daemon process is another option, but there appears to be no documentation on www.postfix.org about LMTP.

LMTP is fully supported in postfix. You just set mailbox_transport (or trhe right hand side of your transport map) to lmtp:... - see postfix's smtp(8). I've used it to deliver to cyrus imapd for a long time now. See RFC 2033 for the LMTP standard.

-- Carson

Barry Warsaw

4:40 a.m.

New subject: [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Sep 28, 2006, at 12:15 AM, Carson Gaspar wrote:

...

--On Wednesday, September 27, 2006 11:54 PM -0400 Barry Warsaw <barry@python.org> wrote:

...
Looking at Postfix, what other options are readily available? I
suppose you could try to hook into the transport maps, but if I
understand them correctly, you're still talking about forking a process per message. Maybe LMTP to a daemon process is another option, but there
appears to be no documentation on www.postfix.org about LMTP.

LMTP is fully supported in postfix. You just set mailbox_transport
(or trhe right hand side of your transport map) to lmtp:... - see postfix's
smtp(8). I've used it to deliver to cyrus imapd for a long time now. See RFC
2033 for the LMTP standard.

Cool, thank for the reference. I'll read up on that.

-Barry

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRRtSNXEjvBPtnXfVAQLjfAP9GLgoymsgU+O8oCp+yHn2qR1qE4lkqnId WtSNvUP7cAzL/CtJ6HGdp+Kbyqw/raC9yS797EGBniHu3Y8ACCaInyY0mVS5a8h+ 0oIejbeGCvxJCsvPt653H7MSjgHwdv/xeLZBZx4xKxw3c8fCuYG9eUQrli6eXXIq TkF93BbrtOQ= =Oeh2 -----END PGP SIGNATURE-----

Brad Knowles

4:25 a.m.

New subject: [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman

At 11:54 PM -0400 9/27/06, Barry Warsaw wrote:

...

Looking at Postfix, what other options are readily available? I suppose you could try to hook into the transport maps, but if I understand them correctly, you're still talking about forking a process per message.

Use LMTP instead. This will allow you to completely avoid an intermediate (and unnecessary) queue-to-disk stage.

...

                   Maybe LMTP to a daemon process is another
option, but there appears to be no documentation on www.postfix.org about LMTP.

Wietse didn't invent LMTP, he's just using the technique that was invented by others. Look in the sendmail source code, it includes an LDA that implements LMTP. For that matter, I think postfix also includes an LDA that implements LMTP.

LMTP is basically just exactly like SMTP, except that it's via the localhost interface only, and simplifies a number of other assumptions as well. So, if you've got code that can do SMTP, then you've already got code that can do LMTP.

Frankly, there's not much to LMTP, which I think is a large part of why you're not likely to see a great deal written about it within the sendmail and postfix codebases.

...

As for what's best for other MTAs, that's a good question. I think we'll always have to support delivery-to-program since that seems like it's the lowest common method. If there are delivery mechanisms that make more sense for specific MTAs, then I'm all for including them,

LMTP is probably the best and most native method for both sendmail and postfix. I can't speak for other MTAs.

...

                              but others who are more
familiar with those mailer servers will have to help (read: donate code).

IIRC, sendmail has a Berkeley-style copyright, so the "donation" of code should not be an issue. Converting the code from C to Python, now that may be a bit more of a problem.

-- Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."

 -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
 Assembly to the Governor, November 11, 1755

Founding Individual Sponsor of LOPSA. See <http://www.lopsa.org/>.

Barry Warsaw

5:07 a.m.

New subject: [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Sep 28, 2006, at 12:25 AM, Brad Knowles wrote:

...

At 11:54 PM -0400 9/27/06, Barry Warsaw wrote:

...
Looking at Postfix, what other options are readily available? I suppose you could try to hook into the transport maps, but if I understand them correctly, you're still talking about forking a process per message.

Use LMTP instead. This will allow you to completely avoid an
intermediate (and unnecessary) queue-to-disk stage.

The only problem I see is that I'm not sure Postfix can be configured
to deliver via LMTP on a per-recipient basis. It looks like you have
to specify LTMP via transports and so must deliver all mail for a
particular domain to that transport. That's not good because of
course we want Mailman to only receive messages for the aliases
corresponding to mailing lists, while other things like local
recipients or non-list aliases get delivered in the "normal way".

Or is there some way I'm missing that would allow us to segregate
some domain traffic to Mailman's LMTP server and other traffic to
Postfix's standard transports? What about Sendmail?

Other than that, we'd need a reliable standards-compliant LMTP server
written in Python (and no, smtpd.py- or Twisted-based versions are
not acceptable ;).

-Barry

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRRtYj3EjvBPtnXfVAQKIowP/anqUQekOOQO8v18sE9PuDtqfeIt8gamF r+AUEyLFQJ9D2RiCcrJm74mXbGGoVVOxsTaMpIzLdfLL20K+MtX8UtgAcEYT40+C TzrT34p3I8pVriplpsLIXMb539If3/CR00I3XRhlHhYGbMt14PC/KlpAQ0TCkzqH uRRDtEWXDlk= =HwLP -----END PGP SIGNATURE-----

Carson Gaspar

5:57 a.m.

New subject: [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman

--On Thursday, September 28, 2006 1:07 AM -0400 Barry Warsaw <barry@python.org> wrote:

...

Or is there some way I'm missing that would allow us to segregate some domain traffic to Mailman's LMTP server and other traffic to Postfix's standard transports? What about Sendmail?

Shouldn't be an issue with postfix. From the default postfix transport map template:

# TABLE LOOKUP # With lookups from indexed files such as DB or DBM, or from # networked tables such as NIS, LDAP or SQL, patterns are # tried in the order as listed below: # # user+extension@domain transport:nexthop # Mail for user+extension@domain is delivered through # transport to nexthop. # # user@domain transport:nexthop # Mail for user@domain is delivered through transport # to nexthop. # # domain transport:nexthop # Mail for domain is delivered through transport to # nexthop. # # .domain transport:nexthop # Mail for any subdomain of domain is delivered # through transport to nexthop. This applies only # when the string transport_maps is not listed in the # parent_domain_matches_subdomains configuration set- # ting. Otherwise, a domain name matches itself and # its subdomains. # # Note 1: the special pattern * represents any address (i.e. # it functions as the wild-card pattern). # # Note 2: the null recipient address is looked up as # $empty_address_recipient@$myhostname (default: mailer-dae- # mon@hostname).

Brad Knowles

7:10 a.m.

New subject: [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman

At 10:57 PM -0700 9/27/06, Carson Gaspar wrote:

...

...
Or is there some way I'm missing that would allow us to segregate some domain traffic to Mailman's LMTP server and other traffic to Postfix's standard transports? What about Sendmail?

Shouldn't be an issue with postfix. From the default postfix transport map template:

Sendmail should be able to do something comparable, although I have not yet looked at the documentation to see just exactly how you'd implement that. Certainly, looking up addresses in a variety of database types is not a new idea.

-- Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."

 -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
 Assembly to the Governor, November 11, 1755

Founding Individual Sponsor of LOPSA. See <http://www.lopsa.org/>.

Tokio Kikuchi

1:52 p.m.

New subject: [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman

Barry Warsaw wrote:

...

Other than that, we'd need a reliable standards-compliant LMTP server
written in Python (and no, smtpd.py- or Twisted-based versions are
not acceptable ;).

Why no smtpd.py ? There is a MailmanProxy Object in the code which was written by you, Barry. Any SMTP server can become a LMTP server IIRC.

http://docs.python.org/lib/node624.html

-- Tokio Kikuchi, tkikuchi@is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/

Barry Warsaw

2:21 p.m.

New subject: [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Sep 29, 2006, at 9:52 AM, Tokio Kikuchi wrote:

...

Barry Warsaw wrote:

...
Other than that, we'd need a reliable standards-compliant LMTP server written in Python (and no, smtpd.py- or Twisted-based versions are not acceptable ;).

Why no smtpd.py ? There is a MailmanProxy Object in the code which
was written by you, Barry. Any SMTP server can become a LMTP server IIRC.

http://docs.python.org/lib/node624.html

Only because smtpd.py is asyncore based and I'm not convinced it can
be made high performance enough. An smtpd.py based LTMP server could
provide an interesting proof of concept though.

-Barry

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRR0sCHEjvBPtnXfVAQJPTwP/XONZhqcXwU29qfROOOJ29+ITWUl0uuu4 Eeafwq+15ndxOItEB5xXeEIc7/ctGDGAARuXsYME+ylvTk9bG3fWhN+kb+VdYuP1 GbAEPaNjJ90+3zjhyDGkKRjru4gH01KKEh5YJygYvIEyor8FcEr+fkrPpMK2ZTNX Mfw+IJBc73o= =VHNo -----END PGP SIGNATURE-----

Tokio Kikuchi

October 2006

1:36 a.m.

New subject: pipe-to-prog/maidir/lmtp performance

Hi,

...

                              An smtpd.py based LTMP server could

provide an interesting proof of concept though.

I've almost finished writing this primitive LMTP server.

Here is a result of experiment of posting 100 messages to a list and measuring arriving/leaving times on my laptop coLinux (Debian) with Postfix-2.1.5/Python2.4. Time recording is based on the message header and mail.log, so it is a very rough experiment.

    MTA:In   MTA:Out  MM:Out  sec

program: first 08:56:49 08:56:51 08:56:53 4 last 08:56:54 08:57:27 08:57:38 44 sec 5 36 49

maildir: first 08:48:13 08:48:14 08:48:16 3 last 08:48:19 08:48:19 08:48:29 10 sec 6 5 16

lmtp: first 09:12:26 09:12:26 09:12:27 1 last 09:12:31 09:12:32 09:12:41 10 sec 5 6 15

With the current default method of invoking mailman post program and pipelineing takes 49 seconds for the last 100th message to reach the list member (a local user), while maildir and lmtp interface reduce this time to 15 or 16 seconds. You can also see most of the difference occured in MTA's In and Out, which means execution of mailman post program is the most heavy load. Also, maildir and lmtp passing of message from MTA to mailman is a relatively small task than the processing the list message like cooking headers and appending footers and all like those.

Conclusion: Maildir or LMTP will not likely be a bottleneck.

Cheers,

-- Tokio Kikuchi tkikuchi@is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/

Brad Knowles

2:11 a.m.

New subject: pipe-to-prog/maidir/lmtp performance

At 10:36 AM +0900 10/1/06, Tokio Kikuchi wrote:

...

...
                               An smtpd.py based LTMP server could
provide an interesting proof of concept though.
I've almost finished writing this primitive LMTP server.

I am no longer on the list. Please do not include me in any future discussions on subjects relating to the list.

-- Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."

 -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
 Assembly to the Governor, November 11, 1755

Founding Individual Sponsor of LOPSA. See <http://www.lopsa.org/>.

Barry Warsaw

3:27 p.m.

New subject: pipe-to-prog/maidir/lmtp performance

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Sep 30, 2006, at 9:36 PM, Tokio Kikuchi wrote:

...

With the current default method of invoking mailman post program
and pipelineing takes 49 seconds for the last 100th message to
reach the list member (a local user), while maildir and lmtp
interface reduce this time to 15 or 16 seconds. You can also see
most of the difference occured in MTA's In and Out, which means
execution of mailman post program is the most heavy load. Also,
maildir and lmtp passing of message from MTA to mailman is a
relatively small task than the processing the list message like
cooking headers and appending footers and all like those.

Conclusion: Maildir or LMTP will not likely be a bottleneck.

Those are very interesting number Tokio, thanks for doing the
experiment and posting there results. How much work is your LMTP
implementation doing when it receives the message? Is it parsing it
and storing the msg pickle? Is it touching any MailList objects?
Does it do the entire delivery pipeline? IIUC your results indicate
total throughput from submission to MTA to final delivery to user.
If that's the case, there's some common constant amount of work being
done and I'm just wondering how efficient the LMTP part will be.

When you feel confident about your lmtp implementation, go ahead and
check it in (probably in Mailman/bin/lmtp.py with hooks for mmshell

-- or I can do the latter). I think at this early date we should
make both LMTP and Maildir delivery possible, then we'll try to get
real-world feedback from users as to which we should ultimately
recommend.
-Barry

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRR/eWHEjvBPtnXfVAQJLPAP/cmQEXjI3fx3Sj4Yq8QJhT4DChuAAVWF8 y0jiCLpViWRP67QtXkEmJhKJPsoakgfQZXMnENHNUzlQ2321KO1Ed4S4H0eKRhWJ DU8KeudwfDzobA42duaWBL6412RNZlmE+b8mEWtctX5ESM3YhHbELklGCdB8GPAV oSRhMsQGJbY= =IA1i -----END PGP SIGNATURE-----

Tokio Kikuchi

1:12 a.m.

New subject: pipe-to-prog/maidir/lmtp performance

Hi,

Barry Warsaw wrote:

...

When you feel confident about your lmtp implementation, go ahead and check it in (probably in Mailman/bin/lmtp.py with hooks for mmshell

-- or I can do the latter). I think at this early date we should make both LMTP and Maildir delivery possible, then we'll try to get real-world feedback from users as to which we should ultimately recommend.

I've commited it as LMTPRunner.py because it should be restarted by mailmanctl.

...

                                   How much work is your LMTP

implementation doing when it receives the message?

It does the same thing as MaildirRunner does, although it doesn't do saving the misdirected messages but only to report them in error log.

...

If that's the case, there's some common constant amount of work being
done and I'm just wondering how efficient the LMTP part will be.

My impression is that it is almost the same as Maildir delivery.

-- Tokio Kikuchi, tkikuchi@is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/

Barry Warsaw

8:03 p.m.

New subject: pipe-to-prog/maidir/lmtp performance

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Oct 1, 2006, at 9:12 PM, Tokio Kikuchi wrote:

...

I've commited it as LMTPRunner.py because it should be restarted by
mailmanctl.

...
                                   How much work is your LMTP   
implementation doing when it receives the message?
It does the same thing as MaildirRunner does, although it doesn't
do saving the misdirected messages but only to report them in error
log.

...
If that's the case, there's some common constant amount of work
being done and I'm just wondering how efficient the LMTP part
will be.

My impression is that it is almost the same as Maildir delivery.

Thanks Tokio! I saw your checkins and I'm going to play with this a
bit. There are a few things about virtual_mailbox_maps that I'm not
sure I like so LMTP delivery (if it can hold up efficiency-wise)
might be preferable all around.

Cheers,

-Barry

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRSLCKnEjvBPtnXfVAQLf4gP/YFstJb4RQrV2dh5It8REMfQ5bRwWG9U4 rLEQyo1rhzWbHux2LfPIYoOgkcxJWbQgFcgpdl2Z0JfKhk1W8RP3FfFKG5vH6anC rXGyiu2QHA+lzharzIdyxJKWx35pfhPTUHv/WqRpWp6PjFfGDorRU9diVCYTMNPZ /0kr9LMDr28= =wImd -----END PGP SIGNATURE-----

Nigel Metheringham

September 2006

8:11 a.m.

New subject: [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman

On Wed, 2006-09-27 at 23:25 -0500, Brad Knowles wrote:

...

LMTP is probably the best and most native method for both sendmail and postfix. I can't speak for other MTAs.

Exim can do LMTP, over a pipe (ie fork/exec program), a socket or TCP/IP.

Nigel.

-- [ Nigel Metheringham Nigel.Metheringham@InTechnology.co.uk ] [ - Comments in this message are my own and not ITO opinion/policy - ]

Barry Warsaw

12:12 p.m.

New subject: LTMP for incoming mail

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

I've changed the subject to more accurately reflect this thread.

On Sep 28, 2006, at 4:11 AM, Nigel Metheringham wrote:

...

On Wed, 2006-09-27 at 23:25 -0500, Brad Knowles wrote:

...
LMTP is probably the best and most native method for both sendmail and postfix. I can't speak for other MTAs.

Exim can do LMTP, over a pipe (ie fork/exec program), a socket or TCP/IP.

What I find really intriguing about this approach is the ability to
reject some messages immediately, presumably allowing the MTA to
bounce them. It's not clear how much work we could get away with at
message receipt time in our mythical LMTP server, but let's imagine
we could at least parse the message and do some preliminary sanity
checks on it. Say the message had MIME breakage or couldn't be
parse. Or say we could do a quick check of the sender's permissions
to post, etc. We could reject the message then before it entered
Mailman's incoming queue.

What worries me most about this though is that we'd have to (probably
write and) maintain an LMTP server, and make sure that it was
efficient enough to keep up with incoming mail pressures, both steady
state and peak, and be highly robust against errors. Plus, this is
another moving part that mailmanctl would have to manage, and of
course, if our LMTP server breaks, your Mailman is no longer
accepting incoming mail. We should not underestimate the work
involved here.

I did a quick Google search to see if there were any GPL'd LMTP
servers we could piggyback on, the idea being that if we could find a
shell of a C program we could embed Python in and talk directly to
Mailman during the LMTP protocol. I found this one in the PLL project:

http://pll.sourceforge.net/

but it looks like the code is several years old so I doubt it's being
maintained, and it doesn't appear to have been released as a 1.0
tool. Postfix has an lmtp server, but it seems fairly heavyweight
(being tied into the smtp server) and it's not clear to me we could
combine our GPL code with Postfix's license.

Of course there's always Twisted, but that seems like a big bite to
take for this task.

ISTM that the trade-off then is rolling our own LMTP server vs. doing
maildir delivery. Are we confident that we can implement a high
performance enough server that would give us better throughput than
maildir would? In Python?

It might be fun to try, but OTOH it /is/ a distraction from other MM
2.2 work that needs to get done. So unless anybody has any leads on
existing GPL-compatible code we could use, or feels really motivated
to work on a Python version, I'm inclined to go with maildir for
MM2.2. It's not like we couldn't add LMTP at some later point.

-Barry

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRRu8TXEjvBPtnXfVAQICiAP8DahfyRAcVrrbYIFAUlC8R8AA7oZiuJxV NPQ/7Juaf0FrU/nrK+2uWZ6Zf614Tv4lQh3TwRxaOgPHgfsSB8a0UknN+Zy+nXyK gs9DMHQ2iIan/uIDxo1E4Qtu9sDuh1nctbm8pd3NW8bbyvqmNyli9bknHqE/LtDu 12RcvYEBIdc= =h3r9 -----END PGP SIGNATURE-----

Brad Knowles

1:21 p.m.

New subject: LTMP for incoming mail

At 8:12 AM -0400 9/28/06, Barry Warsaw wrote:

...

What I find really intriguing about this approach is the ability to reject some messages immediately, presumably allowing the MTA to bounce them.

Yup.

...

           We could reject the message then before it entered

Mailman's incoming queue.

Indeed, that's a key advantage. IIRC, procmail does this with the system-wide and user-defined rulesets.

...

I did a quick Google search to see if there were any GPL'd LMTP servers we could piggyback on, the idea being that if we could find a shell of a C program we could embed Python in and talk directly to Mailman during the LMTP protocol.

Does it have to be GPL? Is a Berkeley-type license not okay? Checking the source for sendmail 8.13.8, I find that there is an official part of the package which includes the LDA "mail.local", which is LMTP-capable, among other things. It can also do user mailbox hashing, based on the username. You can either hash directly to a path like /var/mail/u/s/user or use an MD5 hash of the username in a base64 representation (changing "/" to "_"), and you can control how many levels of hashing are to be used.

Seems to me like this would be a pretty obvious candidate.

...

   Postfix has an lmtp server, but it seems fairly heavyweight
(being tied into the smtp server) and it's not clear to me we could combine our GPL code with Postfix's license.

Please check out the sendmail mail.local stuff and tell me if this is a better alternative. If you need a different license, please let me know -- I've known Eric for many years (since way before the company existed). While I won't make any guarantees, I will say that if we need a different license, I imagine that I can get a more sympathetic ear than you might otherwise be able to find.

...

ISTM that the trade-off then is rolling our own LMTP server vs. doing maildir delivery. Are we confident that we can implement a high performance enough server that would give us better throughput than maildir would? In Python?

Dunno about doing it in Python, but I will say that going to Maildir as an additional queue-on-disk mechanism on top of everything else we're already doing seems to be a big step backward in terms of potential performance issues and I don't really see any significant positive benefit.

At AOL, we used to use a queue-on-disk method for the Internet mail gateways. Sendmail would take the incoming message, hand it off to a custom LDA, the custom LDA would then dump that in a disk queue asynchronously, then a synchronous queue runner process would come along and pick up the messages and send them over to Stratus. Believe me, this system sucked big time -- we had never ending problems with disk queues building up to the point where the queue runners could never possibly catch up, etc....

And I'm not seeing any real significant operational differences here between what you're talking about doing and what AOL abandoned years ago. Okay, so you're talking about using Maildir instead of a typical "linear" queue-on-disk and you don't have to do file locking to guarantee queue entry creation, but that's still dumping everything into a single directory from which we then have to scan and pull stuff out and you probably do still have to do some sort of file locking in order to make sure that the input and output queue mechanisms don't step on each others toes.

...

It might be fun to try, but OTOH it /is/ a distraction from other MM 2.2 work that needs to get done. So unless anybody has any leads on existing GPL-compatible code we could use, or feels really motivated to work on a Python version, I'm inclined to go with maildir for MM2.2. It's not like we couldn't add LMTP at some later point.

The single queue directory on disk is already one of our biggest single bottlenecks. I don't see how using Maildir as a delivery mechanism from the MTA to Mailman is going to improve that.

In fact, it seems to me like we're just adding yet another bottleneck of exactly the same sort that we're trying to eliminate elsewhere, but with some additional drawbacks that are unique to Maildir and which will make our overall system performance even worse than it is today.

If we're going to make a big change, it seems to me that LMTP makes much more sense than Maildir. If we can't do LMTP, then I think we'd be much better off working on eliminating other bottlenecks in the system as opposed to adding yet another totally new source of bottlenecks that result from implementing Maildir.

It seems to me that this idea is a case of:

1.  We have to do Something.
2.  This is something.
3.  Therefore, we have to do This.

I think we want to think long and hard about this idea and all it's potential drawbacks and new bottleneck sources, before we take that first step off the cliff.

-- Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."

 -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
 Assembly to the Governor, November 11, 1755

Founding Individual Sponsor of LOPSA. See <http://www.lopsa.org/>.

Ian Eiloart

1:49 p.m.

New subject: LTMP for incoming mail

--On 28 September 2006 08:21:05 -0500 Brad Knowles <brad@stop.mail-abuse.org> wrote:

...

...
What I find really intriguing about this approach is the ability to reject some messages immediately, presumably allowing the MTA to bounce them.

Yup.

...
           We could reject the message then before it entered
Mailman's incoming queue.
Indeed, that's a key advantage. IIRC, procmail does this with the system-wide and user-defined rulesets.

If that's a reason for using LMTP, then I'd prefer SMTP. Exim can call forward to an SMTP server to see if it will accept a message before it's too late to reject it at SMTP time. I don't think it can do that with LMTP, though I may be wrong.

So, with SMTP if Mailman were to reject a sender to a list, my MTA could reject it without causing a bounce.

-- Ian Eiloart IT Services, University of Sussex

Barry Warsaw

2:09 p.m.

New subject: LTMP for incoming mail

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Sep 28, 2006, at 9:21 AM, Brad Knowles wrote:

...

At 8:12 AM -0400 9/28/06, Barry Warsaw wrote:

Does it have to be GPL? Is a Berkeley-type license not okay?

GPL would be best, but Berkeley is probably okay. We'd probably want
to get confirmation of that from the FSF. The key thing is that it
has to be compatible with the GPL (and the Python Software LIcense --
see below) so that we can combine the whole kit and kaboodle.

...

Checking the source for sendmail 8.13.8, I find that there is an
official part of the package which includes the LDA "mail.local",
which is LMTP-capable, among other things. It can also do user
mailbox hashing, based on the username. You can either hash
directly to a path like /var/mail/u/s/user or use an MD5 hash of
the username in a base64 representation (changing "/" to "_"), and
you can control how many levels of hashing are to be used.

Seems to me like this would be a pretty obvious candidate.

So a sketch of the architecture would be to embed Python via its C
API into mail.local, adding callbacks at each point in the LMTP
protocol. From there, we'd write the rules in Python so that they'd
do the message parsing, sanity checking, etc, returning status codes
which mail.local would use to respond to the LMTP command. There has
to be that Python hook, otherwise you can't get to Mailman's data
structures.

...

...
   Postfix has an lmtp server, but it seems fairly heavyweight
(being tied into the smtp server) and it's not clear to me we could combine our GPL code with Postfix's license.
Please check out the sendmail mail.local stuff and tell me if this
is a better alternative. If you need a different license, please
let me know -- I've known Eric for many years (since way before the
company existed). While I won't make any guarantees, I will say
that if we need a different license, I imagine that I can get a
more sympathetic ear than you might otherwise be able to find.

Cool, that's good to know.

...

...
ISTM that the trade-off then is rolling our own LMTP server vs.
doing maildir delivery. Are we confident that we can implement a high performance enough server that would give us better throughput than maildir would? In Python?

Dunno about doing it in Python, but I will say that going to
Maildir as an additional queue-on-disk mechanism on top of
everything else we're already doing seems to be a big step backward
in terms of potential performance issues and I don't really see any
significant positive benefit.

I don't think it's an additional queue-on-disk mechanism, certainly
in comparison to what we're doing today. In fact, thinking about it
more, a maildir approach would be better than what we have today not
only because it eliminates the extra process fork/exec, but because
we could segregate queue/in files into per-list subdirectories. That
way, you're not dumping all message destined for Mailman into one
directory. Not as good as directory hashing, but better than what we
have today.

I'll grant you that LMTP delivery has the potential to be the most
efficient mechanism by which messages get from the MTA into Mailman.
But it's certainly more work and more complicated than maildir; will
you grant that maildir is better than what we have today? Think of
it as a waystation on the road to the ultimate uber-performing list
server. :)

Let me just say that ideally, I think LMTP would be a great way to
go. It's not my top priority though. I'm looking for ways to get
more developers involved in the project, and this seems like a
perfect thing for someone seeking Mailman fame and fortune <wink>.
So, anyone care to take the challenge?

-Barry

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRRvXmHEjvBPtnXfVAQLdQAP8DRQdxi/rnHPIB4I+q3xReOpq2yeW7Wsp y+1AerklGjAhy9+NHOAV2h28rj9YBaS9cp0euJuSWTAoceJfeqBvYM6voL/mfs0I elTntMROq5diyzytfTxBz9qMDM+QAfuutcp7nuxlPCuYv7CskeAySZll7+v0P8eD 1unF56C5Dns= =R86i -----END PGP SIGNATURE-----

Brad Knowles

4:06 p.m.

New subject: LTMP for incoming mail

At 10:09 AM -0400 9/28/06, Barry Warsaw wrote:

...

...
Does it have to be GPL? Is a Berkeley-type license not okay?

GPL would be best, but Berkeley is probably okay. We'd probably want to get confirmation of that from the FSF. The key thing is that it has to be compatible with the GPL (and the Python Software LIcense -- see below) so that we can combine the whole kit and kaboodle.

Is there any license questions or issues that we would need to have answered or confirmed by the Sendmail Consortium? Or should we wait on that until we've heard back from the FSF?

...

...
Dunno about doing it in Python, but I will say that going to Maildir as an additional queue-on-disk mechanism on top of everything else we're already doing seems to be a big step backward in terms of potential performance issues and I don't really see any significant positive benefit.

I don't think it's an additional queue-on-disk mechanism, certainly in comparison to what we're doing today.

Maildir was not designed as an efficient queue-on-disk strategy. It was designed to allow multiple simultaneous parallel deliveries to the NFS-mounted mailbox of a given user, and we know that it does a number of additional unnecessary things that seriously hurt its performance even in that relatively tightly defined context.

It does unnecessary file renames (which cause additional synchronous meta-data filesystem operations), it uses filenames that are too long and bust iname/inode caching schemes, and it doesn't make use of obvious significant performance-enhancing mechanisms like directory hashing.

It's pretty easy to design a mechanism that is much more efficient -- and scalable -- in handling multiple simultaneous deliveries to a user mailbox on NFS.

So why would we want to abuse a bad scheme for user-mailbox-on-NFS as an alternative scheme for queue-on-disk?

If we have queue-on-disk problems, why not solve them by implementing a more efficient queue-on-disk scheme, instead of abusing a poorly designed user-mailbox-on-NFS scheme?

...

                                           That way,
you're not dumping all message destined for Mailman into one directory. Not as good as directory hashing, but better than what we have today.

That would be somewhat of an improvement in some respects, but Maildir also brings along a lot of additional baggage and I'm not at all convinced that it's worth the effort.

...

I'll grant you that LMTP delivery has the potential to be the most efficient mechanism by which messages get from the MTA into Mailman. But it's certainly more work and more complicated than maildir; will you grant that maildir is better than what we have today? Think of it as a waystation on the road to the ultimate uber-performing list server. :)

I'm not at all convinced that Maildir would be an overall improvement over what we have today. I think that adding a directory hashing scheme on a fork()/exec() model would probably be a bigger improvement than changing our inbox delivery mechanism from a fork()/exec() model and using Maildir instead.

At least by sticking with fork()/exec() and adding a directory hashing scheme on top of that, we wouldn't need to make any changes to the way we interface with MTAs today -- all the changes could be kept completely internal to Mailman. If we were to switch to Maildir as an inbox delivery method, not only would we have to change the way we interface with MTAs, we would also have to make internal changes to Mailman to support the use of Maildir as our queue-on-disk mechanism. That's a bigger overall change with bigger risk and relatively lower potential payoff.

If we were to work on implementing a directory hashing scheme instead of working on Maildir, we could still add LMTP at a later date.

That would allow us to go back at a later time and enhance our features that we provide to Mailing list administrators, while also giving us time to look more deeply into the potential performance issues and make sure that we're not causing more problems than we're solving.

...

Let me just say that ideally, I think LMTP would be a great way to go. It's not my top priority though. I'm looking for ways to get more developers involved in the project, and this seems like a perfect thing for someone seeking Mailman fame and fortune <wink>.

I'm not convinced that this is an improvement.

...

So, anyone care to take the challenge?

I'm not a developer, but I do have experience with building large-scale mail and mailing list systems, and if you're willing to listen to me then I'm willing to give you the benefit of my experience.

IMO, Maildir is a Red Herring. The one and only reason to ever consider using Maildir is if you're implementing a large-scale IMAP mail server system and you're required to store user mailboxes on NFS.

Even then, you'd be well-served to look for better storage mechanisms, because throwing potentially hundreds of thousands of messages into a single directory is guaranteed to cause huge performance issues, even if every single mailbox operation didn't involve scanning the entire directory and doing a stat() on every single file, locking the entire directory, creating/renaming/deleting the file(s) as appropriate, and then unlocking the directory.

I think we're better off spending our resources working on trying to resolve the real bottleneck issues that we already know are present in our system as opposed to working on cool stuff that may or may not help but would require more overall changes to more parts of the system and with relatively lower potential payoff.

-- Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."

 -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
 Assembly to the Governor, November 11, 1755

Founding Individual Sponsor of LOPSA. See <http://www.lopsa.org/>.

Dale Newfield

4:16 p.m.

New subject: LTMP for incoming mail

Brad Knowles wrote:

...

I think we're better off spending our resources working on trying to resolve the real bottleneck issues that we already know are present in our system as opposed to working on cool stuff that may or may not help but would require more overall changes to more parts of the system and with relatively lower potential payoff.

While I agree with Brad some days more than others, he has a good point here. Presumably every mailman installation is a multiplier for email messages (I.E.: many more go out than come in), so if we're worrying about the bottlenecks at the inbound side, I'm willing to bet they result in much more important bottlenecks on the outbound side...

-Dale

John A. Martin

5:29 p.m.

New subject: LTMP for incoming mail

...

...
...
...
...
"Brad" == Brad Knowles "Re: [Mailman-Developers] LTMP for incoming mail" Thu, 28 Sep 2006 11:06:29 -0500

Brad> Maildir was not designed as an efficient queue-on-disk
Brad> strategy.

Is in not possible to do Postfix virtual mailbox domains _without_ maildir style delivery? (Considering Postfix virtual mailbox domains is what lead to this, no?) Doesn't the example given in the Postfix VIRTUAL_README show delivery to a mailbox on line 11 and delivery to a maildir on line 12? Why not consider Postfix virtual mailbox domains and their workalikes with other MTAs separately and independently from the choices of delivery format and local delivery agents some of which may also be chosen independently?

Independently of the above and at least for Postfix, would it be worthwhile looking at the Postfix policy daemon plug-in as a way to query Mailman information during the rfc821 conversation and rejecting a lot of messages before DATA? Even more interesting, because of perhaps being portable, might be the milter facility that appeared in Postfix 2.3. I have not yet even looked at the later so it may AFIK be very wide of the mark.

jam

Brad Knowles

8:21 p.m.

New subject: LTMP for incoming mail

At 1:29 PM -0400 9/28/06, John A. Martin wrote:

...

Is in not possible to do Postfix virtual mailbox domains _without_ maildir style delivery?

Probably, but I'm not sure it really buys us much of anything to have separate mailboxes for each list, as opposed to a queue processing mechanism that is generally more robust and capable of easily handling lots of simultaneous transactions.

...

Independently of the above and at least for Postfix, would it be worthwhile looking at the Postfix policy daemon plug-in as a way to query Mailman information during the rfc821 conversation and rejecting a lot of messages before DATA?

That's going to have the same issues as implementing LMTP, at least as far as it concerns performance and ability to handle these kinds of operations on a large-scale/real-time basis.

...

                            Even more interesting, because of
perhaps being portable, might be the milter facility that appeared in Postfix 2.3. I have not yet even looked at the later so it may AFIK be very wide of the mark.

Same issues for all. The particular protocol is not particularly relevant to this part of the discussion. Regardless of how you implement the system, it's going to have to do certain types of message scanning and parsing, checking against databases and/or blacklists, etc.... In all cases, so long as you're holding the sender open while you check all these things, you're going to have pretty much the same concerns regarding performance and scalability.

-- Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."

 -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
 Assembly to the Governor, November 11, 1755

Founding Individual Sponsor of LOPSA. See <http://www.lopsa.org/>.

Barry Warsaw

6:40 p.m.

New subject: LTMP for incoming mail

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Sep 28, 2006, at 12:06 PM, Brad Knowles wrote:

...

Is there any license questions or issues that we would need to have
answered or confirmed by the Sendmail Consortium? Or should we
wait on that until we've heard back from the FSF?

I would ask them if their license is GPL compatible. IOW, do they
believe we can combine GPL code with theirs? Better yet would be
cases where that's actually been done before.

...

Maildir was not designed as an efficient queue-on-disk strategy.
It was designed to allow multiple simultaneous parallel deliveries
to the NFS-mounted mailbox of a given user, and we know that it
does a number of additional unnecessary things that seriously hurt
its performance even in that relatively tightly defined context.

It does unnecessary file renames (which cause additional
synchronous meta-data filesystem operations), it uses filenames
that are too long and bust iname/inode caching schemes, and it
doesn't make use of obvious significant performance-enhancing
mechanisms like directory hashing.

It's pretty easy to design a mechanism that is much more efficient
-- and scalable -- in handling multiple simultaneous deliveries to
a user mailbox on NFS.

So why would we want to abuse a bad scheme for user-mailbox-on-NFS
as an alternative scheme for queue-on-disk?

If we have queue-on-disk problems, why not solve them by
implementing a more efficient queue-on-disk scheme, instead of
abusing a poorly designed user-mailbox-on-NFS scheme?

Remember, this discussion all started because Postfix virtual host
delivery is broken on the trunk. The virtual_mailbox_maps feature is
a new one since we last looked at how to integrate Mailman and
Postfix. What looks appealing about this is that we can actually pre- sort message based on recipient address, and in fact we could pre- sort by domain, list name, and list alias. This really has a big
advantage in that Mailman's incoming runner can do less message
inspection to determine where that message is supposed to go. If a
file came from /usr/local/mailman/queue/in/mydomain/mylist/post, we
know immediately that it's destined for list members. Etc. We also
get a layout with fewer messages in more subdirectories for free.

virtual_mailbox_maps don't appear to be useful for delivery-to- program, and delivery-to-program with Postfix virtual domains as
we're doing them now has important disadvantages, most notably that
we have to create both an alias and a virtual recipient, and we have
to encode the domain name such that it's a valid alias, without
introducing additional collisions. That's icky.

I think John was asking about using virtual_mailbox_maps with
delivery to mbox, but I think that's worse, because mbox delivery
forces you to implement locking to avoid contention on the shared
file. So if we're going to utilize virtual_mailbox_maps I think
we're stuck with a maildir layout in queues/in.

...

...
I'll grant you that LMTP delivery has the potential to be the most efficient mechanism by which messages get from the MTA into Mailman. But it's certainly more work and more complicated than maildir; will you grant that maildir is better than what we have today? Think of it as a waystation on the road to the ultimate uber-performing list server. :)

I'm not at all convinced that Maildir would be an overall
improvement over what we have today. I think that adding a
directory hashing scheme on a fork()/exec() model would probably be
a bigger improvement than changing our inbox delivery mechanism
from a fork()/exec() model and using Maildir instead.

I don't think there's anyway to really know without implementing it
and doing some measurements. Since we won't be losing delivery-to- program, that would be possible.

...

At least by sticking with fork()/exec() and adding a directory
hashing scheme on top of that, we wouldn't need to make any changes
to the way we interface with MTAs today -- all the changes could be
kept completely internal to Mailman. If we were to switch to
Maildir as an inbox delivery method, not only would we have to
change the way we interface with MTAs, we would also have to make
internal changes to Mailman to support the use of Maildir as our
queue-on-disk mechanism. That's a bigger overall change with
bigger risk and relatively lower potential payoff.

Nope, we simply have to implement a MaildirRunner to pull messages
out of queue/in using the directory layout format we decide on. We
have to do something anyway because the current Postfix integration
method for virtual domains is broken, and I think the fix is uglier
and more error prone that switching to a different integration
method. I have no problem continuing to maintain delivery-to-program
for other MTAs, or even Postfix where there's only a single domain.

...

If we were to work on implementing a directory hashing scheme
instead of working on Maildir, we could still add LMTP at a later
date.

A directory hashing scheme is orthogonal to a maildir based queue/in
scheme. We should definitely do the former because it buys us
advantages for the other queues. We could definitely do LMTP later.
Or someone running a huge site that would really benefit from LMTP
could funnel a portion of their profits into paying us to add it <wink>.

...

...
Let me just say that ideally, I think LMTP would be a great way to go. It's not my top priority though. I'm looking for ways to get more developers involved in the project, and this seems like a perfect thing for someone seeking Mailman fame and fortune <wink>.

I'm not convinced that this is an improvement.

Was that a comment on the preceding paragraph? :)

...

...
So, anyone care to take the challenge?

I'm not a developer, but I do have experience with building large- scale mail and mailing list systems, and if you're willing to
listen to me then I'm willing to give you the benefit of my
experience.

Absolutely. But getting LMTP support into Mailman will still require
a developer to step up and write code. Maybe Tokio or Mark can be
convinced, or maybe there's another developer lurking out there who
would be interested. I just want to unbreak Postfix virtual domains
and then fry our bigger fish.

...

IMO, Maildir is a Red Herring. The one and only reason to ever
consider using Maildir is if you're implementing a large-scale IMAP
mail server system and you're required to store user mailboxes on NFS.

I think the thing you're missing is that we need to get the messages
from the MTA into Mailman's incoming queue /somehow/, and we're
basically limited by what the various MTAs have to offer. This is
primarily an integration issue, so it's necessarily MTA-specific,
even if we were to do nothing and stick with delivery-to-program. We
can -- and must -- do better with Postfix virtual domains, and as I
see it, using virtual_mailbox_maps with maildir delivery is the best
option available. I'm still open to other suggestions but as yet, I
don't see a better way.

BTW, all this discussion of Postfix integration should not make Exim,
Sendmail, or qmail users feel left out! If there are betters ways to
get mail from those MTAs into Mailman's incoming queue, I'm all for
improving those integration points too. I just need guidance from
those more knowledgeable with those MTAs as to what changes we should
make, if any. We're not playing favorites, and we're not going to
make any design choices that would improve Postfix integration at the
expense of other MTAs.

...

I think we're better off spending our resources working on trying
to resolve the real bottleneck issues that we already know are
present in our system as opposed to working on cool stuff that may
or may not help but would require more overall changes to more
parts of the system and with relatively lower potential payoff.

Fixing Postfix virtual domain integration is a real problem that
needs solving, which is how this whole thread started.

-Barry

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRRwXOXEjvBPtnXfVAQIYogP/R0+WjnzoYylVdWR9779e9Giht6euldTQ OjRYXw1IkLGoZOgbXCQF9UvUASw+3NGKVj5nRGKPVBaXOqAZZCYuQHkSTa0ZsIe/ oRBMtbYokHGxV9DFz5g7b6aoSLaHW8u0ieMdk1uvxcrVveVt8jjxD9IifDvhXYBV V3HYgOrg7Dg= =pdD3 -----END PGP SIGNATURE-----

John A. Martin

8:41 p.m.

New subject: LTMP for incoming mail

...

...
...
...
...
"baw" == Barry Warsaw "Re: [Mailman-Developers] LTMP for incoming mail" Thu, 28 Sep 2006 14:40:51 -0400

baw> I think John was asking about using virtual_mailbox_maps with
baw> delivery to mbox, but I think that's worse, because mbox
baw> delivery forces you to implement locking to avoid contention
baw> on the shared file.  So if we're going to utilize
baw> virtual_mailbox_maps I think we're stuck with a maildir
baw> layout in queues/in.

Reading the fine manual, VIRTUAL(8), clears some of my careless clueless misconceptions. Postfix virtual mailbox domains are implemented by the virtual(8) delivery agent. It is based upon the Postfix local(8) delivery agent. Perhaps Mailman would like a Postfix delivery agent that uses the virtual_mailbox_maps and friends like virtual(8) does but puts the messages to a structure specifically designed to Mailman's liking. In the interim, would it be possible when using maildirs in this application to avoid some of the gyrations that are ordinarily done for a normal MUA application?

jam

Brad Knowles

8:59 p.m.

New subject: LTMP for incoming mail

At 2:40 PM -0400 9/28/06, Barry Warsaw wrote:

...

I would ask them if their license is GPL compatible. IOW, do they believe we can combine GPL code with theirs? Better yet would be cases where that's actually been done before.

I'll send a note and ask.

...

Remember, this discussion all started because Postfix virtual host delivery is broken on the trunk. The virtual_mailbox_maps feature is a new one since we last looked at how to integrate Mailman and Postfix.

But this is a pure postfix issue, and now we're talking about making potentially large architectural changes to the system to support this one MTA, and without necessarily giving consideration to whether that buys us anything for any of the other MTAs.

I understand that integration with postfix is sub-optimal today, but I'm not convinced that it makes sense to seriously consider an option that may result in throwing out all the other MTAs, in order to fix things with postfix. Worse yet would be trying to maintain two different systems, one for postfix and one for everyone else. Or making architectural changes to support the new stuff for postfix, which may hurt us for other MTAs.

At the very least, I think it makes sense to look at the overall cost/benefit ratio.

Let's assume that we have two systems that are otherwise identical, with roughly equivalent traffic. System A has a single big list, while system B has a number of smaller lists, but the overall aggregate traffic is equal.

With a Maildir solution, system A will see no benefit to the inbound queueing, because you're going to get the same level of contention within a single inbound directory for the one big list as you would for a single inbound directory for all lists on the system. System B would get a benefit, since each list would not be competing for immediate synchronous meta-data update resources with the other lists, although there would still be some intra-list competition.

With a hashed directory solution, both systems would see the same level of benefit, and intra-list competition would be no worse than inter-list competition. And if the competition were to get too high, you simply increase the level of directory hashing.

With a Maildir solution, you give up your ability to implement a hashed directory solution, because the MTA would no longer know how to write messages to your hashed mailbox-directory-per-list, and to get around that you'd have to have some sort of customized local delivery agent no matter what.

With a hashed directory solution, if necessary or desirable you could still implement a separate directory tree per list within your customized local delivery agent, and that directory tree per list could look however you want.

Moreover, a Maildir mailbox-per-list solution doesn't do anything for outbound queues, whereas a properly implemented hashed directory solution should affect outbound at least as much as inbound, at no additional implementation cost.

...

virtual_mailbox_maps don't appear to be useful for delivery-to-program, and delivery-to-program with Postfix virtual domains as we're doing them now has important disadvantages, most notably that we have to create both an alias and a virtual recipient, and we have to encode the domain name such that it's a valid alias, without introducing additional collisions. That's icky.

I know that our current solution is sub-optimal, but I'm not convinced that it's the only way to skin this cat. Moreover, I'm also not convinced that Maildir is the only effective way to make use of virtual_mailbox_maps.

I am pretty much convinced that using Maildir will effectively preclude the ability to make use of directory hashing, precisely because you're letting the MTA write directly to a poor standard interface instead of handling the internal issues in a manner that is opaque to the MTA.

...

I don't think there's anyway to really know without implementing it and doing some measurements. Since we won't be losing delivery-to-program, that would be possible.

True enough, but there is a cost in terms of lost opportunity, and pushing out the delivery schedule by long enough to determine which method is going to work better overall.

...

Nope, we simply have to implement a MaildirRunner to pull messages out of queue/in using the directory layout format we decide on.

With Maildir, you don't have any choice in what the directory layout will look like. That's standardized within the Maildir implementation, and you can't change that. Otherwise, you wouldn't be using Maildir anymore, you'd be using mailbox-directory-solution-that-looks-kinda-semi-sorta-like-maildir-but-modified-by-Mailman-and-incompatible-with-everything-else.

...

A directory hashing scheme is orthogonal to a maildir based queue/in scheme.

I'm not convinced of that. In fact, I'm convinced that they are pretty much mutually exclusive. That is, unless you're talking about using Maildir as a second level of queue-on-disk, before you get to the Mailman-internal queue-on-disk mechanism.

Now, if you are talking about two levels of queue-on-disk so that we can get both Maildir and queue directory hashing, I think that's going to be much, much worse than sticking with the existing postfix virtual domain solution.

...

Or someone running a huge site that would really benefit from LMTP could funnel a portion of their profits into paying us to add it <wink>.

I don't think we're doing enough traffic on python.org for them to justify paying for it. I don't think that Apple is doing enough traffic with Mailman for them to justify paying for it -- not with what we've heard about how the new(er) MacOS X hardware is performing, and especially not with the total lack of any support (or even acknowledgement) that we get from the corporate types.

I don't think that any of the open-source projects (like FreeBSD) are going to be in a position to pay for something like this, or to develop & contribute the necessary code, although they might be doing enough traffic that they could certainly use these features if they were available.

I think that only leaves us with a site like SourceForge, and I think you've probably got better contacts there than any of the rest of us.

...

Absolutely. But getting LMTP support into Mailman will still require a developer to step up and write code.

I'm not that concerned about LMTP. I think that's a big enough issue that we can leave that alone for now.

...

Maybe Tokio or Mark can be convinced, or maybe there's another developer lurking out there who would be interested. I just want to unbreak Postfix virtual domains and then fry our bigger fish.

I would like to see them unbroken, but I also don't want to see anything done that would preclude the use of hashed queue directories, or that would add a second level of queue-on-disk and yet another source of potential bottlenecks.

...

I think the thing you're missing is that we need to get the messages from the MTA into Mailman's incoming queue /somehow/, and we're basically limited by what the various MTAs have to offer.

I certainly was not understanding your point that you wanted to use this as a way to unbreak postfix virtual domains, no.

No, I didn't get that at all.

I'm still not convinced that this is the best way to unbreak postfix virtual domains, however.

...

Fixing Postfix virtual domain integration is a real problem that needs solving, which is how this whole thread started.

Agreed, this is a real problem that needs to be resolved.

-- Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."

 -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
 Assembly to the Governor, November 11, 1755

Founding Individual Sponsor of LOPSA. See <http://www.lopsa.org/>.

Barry Warsaw

11:33 p.m.

New subject: LTMP for incoming mail

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Sep 28, 2006, at 4:59 PM, Brad Knowles wrote:

...

...
Remember, this discussion all started because Postfix virtual host delivery is broken on the trunk. The virtual_mailbox_maps feature is a new one since we last looked at how to integrate Mailman and Postfix.

But this is a pure postfix issue, and now we're talking about
making potentially large architectural changes to the system to
support this one MTA, and without necessarily giving consideration
to whether that buys us anything for any of the other MTAs.

No, we're not. The /only/ difference to support this will be a new
incoming queue runner. Actually, not even that. We'll just
modifying the existing MaildirRunner to work better with the file
system layout that I propose Postfix's virtual_mailbox_maps should be
configured to use.

MaildirRunner /only/ manages queue/in. Once the message is pulled
from the incoming queue, everything else will be MTA agnostic, just
as it is now. No other queue will be maildir based (just as they
aren't now).

It's just one class, and if you don't have Postfix or don't have
virtual domains, or just don't like the idea at all, then you just
keep the regular IncomingRunner you've got now and keep delivering to
the post program. Nothing changes.

-Barry

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRRxb0HEjvBPtnXfVAQJMWgQAidKPtOdIHStZtK2ONcjWTMZKOUJaHsnm XxNNHdeycnDQY6zzZMnov5QFLT0IDr9a5ASuMnd/XxZy3iHPL8By34Hc7n8+Yly+ b8u7FCN+vbOnuf+s1IoDaQETd05X0AYtdIkdmpfQvassfENmTKGLIp1LqKv8WcGG 4XCniTRWtr8= =e7G9 -----END PGP SIGNATURE-----

Barry Warsaw

11:40 p.m.

New subject: LTMP for incoming mail

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Sep 28, 2006, at 4:59 PM, Brad Knowles wrote:

...

I know that our current solution is sub-optimal, but I'm not
convinced that it's the only way to skin this cat. Moreover, I'm
also not convinced that Maildir is the only effective way to make
use of virtual_mailbox_maps.

It may not be. And maybe there's some way other than
virtual_mailbox_maps to better skin the Postfix virtual domain cat.
I haven't heard it yet though.

...

I am pretty much convinced that using Maildir will effectively
preclude the ability to make use of directory hashing, precisely
because you're letting the MTA write directly to a poor standard
interface instead of handling the internal issues in a manner that
is opaque to the MTA.

Not at all. If we implement directory hashing, all the other queues
will gain from them equally, regardless of what MTA is being used,
because after the incoming queue, the MTA doesn't figure into it.
The incoming queue has always been a bit special anyway.

...

With Maildir, you don't have any choice in what the directory
layout will look like. That's standardized within the Maildir
implementation, and you can't change that. Otherwise, you wouldn't
be using Maildir anymore, you'd be using mailbox-directory-solution- that-looks-kinda-semi-sorta-like-maildir-but-modified-by-Mailman- and-incompatible-with-everything-else.

Except that above the maildirs, you could partition the directories
by domain/list/recipient. At least, that's the way I read the docs
for virtual_mailbox_maps. I haven't yet tried it though, so maybe it
doesn't work the way I expect it to.

...

...
A directory hashing scheme is orthogonal to a maildir based queue/in scheme.

I'm not convinced of that. In fact, I'm convinced that they are
pretty much mutually exclusive. That is, unless you're talking
about using Maildir as a second level of queue-on-disk, before you
get to the Mailman-internal queue-on-disk mechanism.

Now, if you are talking about two levels of queue-on-disk so that
we can get both Maildir and queue directory hashing, I think that's
going to be much, much worse than sticking with the existing
postfix virtual domain solution.

This maildir proposal /only affects the incoming queue/ which has
always been special anyway. None of the other queues will be
maildir, and they can all be hashed directories. The two are
completely independent.

-Barry

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRRxdh3EjvBPtnXfVAQKkcwP/aJYv25I8jt2sh6YizKTB7iz8VKEenLRY 04DsuGv4RwZZqxsMO1a385XyjXGo7229mrdU14PHhoNMQ1MGoYPYAKVWc8B05n8X aBWUqNdnwobAnOTWMlK1TNY85wb5nWQ4O9KIgi/HmEz+9r/ujPtKUgeCLL3YyDvJ XEH9zOL7GOU= =rqoL -----END PGP SIGNATURE-----

John W. Baxter

5:01 p.m.

New subject: [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman

On 9/28/06 1:11 AM, "Nigel Metheringham" <Nigel.Metheringham@dev.intechnology.co.uk> wrote:

...

On Wed, 2006-09-27 at 23:25 -0500, Brad Knowles wrote:

...
LMTP is probably the best and most native method for both sendmail and postfix. I can't speak for other MTAs.

Exim can do LMTP, over a pipe (ie fork/exec program), a socket or TCP/IP.

Exim can, indeed. But for some cases only if built with a special build flag. From the (4.6.1) spec:

The lmtp transport runs the LMTP protocol (RFC 2033) over a pipe to a specified command or by interacting with a Unix domain socket. This transport is something of a cross between the pipe and smtp transports. Exim also has support for using LMTP over TCP/IP; this is implemented as an option for the smtp transport. Because LMTP is expected to be of minority interest, the default build-time configure in src/EDITME has it commented out. You need to ensure that

TRANSPORT_LMTP=yes is present in your Local/Makefile in order to have the lmtp transport included in the Exim binary.

However, we seem to be interested in LMTP over TCP (to localhost), and I *think* that is available without the TRANSPORT_LMTP=yes build.

As one data point, the Exim (4.54) shipped with CentOS-4.4 is built without the TRANSPORT_LMTP flag: # exim -bV ... Transports: appendfile/maildir autoreply pipe smtp ...

A quick test with exim -bV -C testConfig suggests that the protocol = lmtp option in an smtp transport is at least not a syntax error (and I believe it will work).

--John

Brad Knowles

2:34 a.m.

New subject: [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman

At 3:04 PM -0400 9/27/06, Barry Warsaw wrote:

...

This appears to allow us to set up true virtual domains without having to encode destination aliases. The trick though is that we would use Maildir delivery for all incoming messages, something I'm keen on switching to for Mailman 2.2 anyway. Maildir is way more efficient than invoking a mail program per incoming message, Mailman already supports Maildir (although it isn't the default), and AFAIK all major MTAs support Maildir.

Sendmail knows nothing of the mailbox delivery method. That is left up to the Local Delivery Agent, which is usually /bin/mail and knows about 7th edition mbox-format, and not much else. You could always substitute a different LDA (e.g., procmail), but that would not be a standard part of sendmail.

Moreover, I'm not keen on Maildir. It makes a lot of trade-offs to try to get something that is NFS-safe, and I'm not convinced those trade-offs are worthwhile, especially not in a non-NFS environment. I think there are other ways that you could get the same benefits (in a mailbox directory solution), without getting the major drawbacks of Maildir per se.

Among other things Maildir creates really hairy long filenames, which can easily blow the iname/inode caching built into most filesystems -- you could get the same benefit by using a better filename naming/hashing scheme with fewer characters. It also does a lot of excessive synchronous meta-data operations (e.g., creating files, renaming files, etc...), and that can place a heavy load on the underlying filesystem.

In his paper regarding what they built for the Earthlinnk mail system, Nick Christenson has clearly proven that you can use the atomic creat() system call in a way that eliminates the need for file locking on NFS, without all the various baggage that Maildir brings to the table.

Mark Crispin also has a lot of good things to say about the weaknesses inherent in Maildir. You should read his comments, too.

-- Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."

 -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
 Assembly to the Governor, November 11, 1755

Founding Individual Sponsor of LOPSA. See <http://www.lopsa.org/>.

Brad Knowles

2:45 a.m.

New subject: [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman

At 9:34 PM -0500 9/27/06, Brad Knowles wrote:

...

Moreover, I'm not keen on Maildir. It makes a lot of trade-offs to try to get something that is NFS-safe, and I'm not convinced those trade-offs are worthwhile, especially not in a non-NFS environment.

One other problem with Maildir -- it throws all message files into the same directory, and doesn't use a hashed directory scheme. If we're going to do directory hashing, I would think we'd want to do it within our mailbox storage mechanism as well as elsewhere in the queueing system.

...

In his paper regarding what they built for the Earthlinnk mail system, Nick Christenson has clearly proven that you can use the atomic creat() system call in a way that eliminates the need for file locking on NFS, without all the various baggage that Maildir brings to the table.

If you want to read Nick's paper, go to <http://www.jetcafe.org/npc/doc/mail_arch.html>. Note that he's also the author of the book _sendmail Performance Tuning_, see <http://www.jetcafe.org/npc/book/sendmail/>.

...

Mark Crispin also has a lot of good things to say about the weaknesses inherent in Maildir. You should read his comments, too.

Mark's comments can be read at <http://www.washington.edu/imap/documentation/formats.txt.html>. Do a "find" on the page for "maildir", and read from there.

-- Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."

 -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
 Assembly to the Governor, November 11, 1755

Founding Individual Sponsor of LOPSA. See <http://www.lopsa.org/>.

Barry Warsaw

3:56 a.m.

New subject: [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Sep 27, 2006, at 10:34 PM, Brad Knowles wrote:

...

At 3:04 PM -0400 9/27/06, Barry Warsaw wrote:

...
This appears to allow us to set up true virtual domains without having to encode destination aliases. The trick though is that we would use Maildir delivery for all incoming messages, something I'm keen on switching to for Mailman 2.2 anyway. Maildir is way more efficient than invoking a mail program per incoming message, Mailman already supports Maildir (although it isn't the default), and AFAIK all major MTAs support Maildir.

Sendmail knows nothing of the mailbox delivery method. That is left up to the Local Delivery Agent, which is usually /bin/mail and knows about 7th edition mbox-format, and not much else. You could always substitute a different LDA (e.g., procmail), but that would not be a standard part of sendmail.

I'm definitely not proposing to get rid of deliver to program, so at
worst, Sendmail users will continue to use this method. Is there a
better way to get the message from Sendmail into Mailman's incoming
queue?

-Barry

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRRtH6XEjvBPtnXfVAQLnawP9FhQDaHH4TtFlV2oo/FwT1YipNuxl3Kr1 vjswKQtKgQN7QUYuSZwnOSZ3O7PBHjzSNXbuu2GQt2hEYYm0VQkbO4I173EO4HKR xSkPGDrBU6n3NDC3WjV7BSedBSHlPnJXmnEsTobxeUQTCeeJRwsxO3QdPef22kn5 qmVOmMwfkBw= =0R2g -----END PGP SIGNATURE-----

Brad Knowles

4:27 a.m.

New subject: [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman

At 11:56 PM -0400 9/27/06, Barry Warsaw wrote:

...

I'm definitely not proposing to get rid of deliver to program, so at worst, Sendmail users will continue to use this method. Is there a better way to get the message from Sendmail into Mailman's incoming queue?

Well, sendmail does LMTP to their custom LDA that they include which is not officially part of the actual sendmail package itself. But you could easily plug in any other LMTP-capable LDA.

-- Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."

 -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
 Assembly to the Governor, November 11, 1755

Founding Individual Sponsor of LOPSA. See <http://www.lopsa.org/>.

emf

3:19 a.m.

New subject: Incoming Queue format

Brad Knowles wrote:

...

Among other things Maildir creates really hairy long filenames, which can easily blow the iname/inode caching built into most filesystems

I can't find a filesystem that has a filename dependency for inode caching, so I suspect I'm completely misunderstanding this. Could you expand on that a bit?

...

-- you could get the same benefit by using a better filename naming/hashing scheme with fewer characters. It also does a lot of excessive synchronous meta-data operations (e.g., creating files, renaming files, etc...), and that can place a heavy load on the underlying filesystem.

Maybe; but there are at least two filesystems (XFS, reiserfs) and likely more that handle file renaming/creating really cheaply, and have their own ninja ways of dealing with really large directories that are the product of a rather large amount of coding hours.

Maildir has the advantage of being bog standard and readily comprehended. While I'm all in favor of some lmtp delivery mechanism, I don't see why we should continue inventing our own queue-on-disk approach merely to cater to poorly designed filesystems.

It seems to me like anyone likely to end up with a huge enough incoming mailman queue to care about Maildir's inefficiencies would also be able to put a sensible filesystem underneath it.

~ethan fremen

Brad Knowles

4:39 a.m.

New subject: Incoming Queue format

At 11:19 PM -0400 9/28/06, emf wrote:

...

I can't find a filesystem that has a filename dependency for inode caching, so I suspect I'm completely misunderstanding this. Could you expand on that a bit?

Some filesystems implement an in-memory hash of recently accessed files, but the filenames are typically truncated to fourteen characters, and the paths to the files may likewise be truncated.

...

Maybe; but there are at least two filesystems (XFS, reiserfs) and likely more that handle file renaming/creating really cheaply, and have their own ninja ways of dealing with really large directories that are the product of a rather large amount of coding hours.

XFS and ReiserFS do not comprise the entire universe of all filesystems in the world in which Mailman will be operated.

There will be plenty of BSD, Solaris, HP-UX, MacOS X, and other OSes where Mailman will be used, and even on Linux you're much more likely to run into ext2fs or ext3fs than either XFS or ReiserFS on most of the several hundred distributions that are available.

...

Maildir has the advantage of being bog standard and readily comprehended. While I'm all in favor of some lmtp delivery mechanism, I don't see why we should continue inventing our own queue-on-disk approach merely to cater to poorly designed filesystems.

While XFS and ReiserFS may have their advantages (and XFS on SGI Irix is much better than XFS on Linux), we can't assume that any portion of the Mailman community will be using these kinds of filesystems. We must be more conservative in our estimates of what filesystem features will be available, and code accordingly.

If we were to assume that everyone had XFS, then let's assume they all have XFS on Irix, or even Veritas VxFS.

...

It seems to me like anyone likely to end up with a huge enough incoming mailman queue to care about Maildir's inefficiencies would also be able to put a sensible filesystem underneath it.

That may simply not be possible. Moreover, I have some real operational problems with both XFS on Linux and ReiserFS, and I would not run a production mail system using them. Maybe IBM's JFS, if I were forced to run a production mail system on Linux at all, but certainly not XFS or ReiserFS.

To be honest, I wouldn't run a real production mail system on anything less than Veritas VxFS, and I'd be real choosy about my underlying hardware, too -- think Hitachi, not EMC.

So your assumptions about what kinds of filesystems may or may not be appropriate are not necessarily going to coincide with the decisions that other people make, or the kinds of hardware and OS they may be forced to live with.

-- Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."

 -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
 Assembly to the Governor, November 11, 1755

Founding Individual Sponsor of LOPSA. See <http://www.lopsa.org/>.

emf

4:23 p.m.

New subject: Incoming Queue format

Brad Knowles wrote:

...

So your assumptions about what kinds of filesystems may or may not be appropriate are not necessarily going to coincide with the decisions that other people make, or the kinds of hardware and OS they may be forced to live with.

I don't disagree with this assertion, nor am I making assumptions about what people get to live with.

I observe that there is a very finite amount of Mailman developer-hours to be had, and that the problems you're discussing have been addressed by people who spent far more time on the problem than we have available to us.

Furthermore, many MTAs *do* understand Maildir, and most admins do as well; using our own queue-on-disk format means MTAs must access Mailman via LTMP, pipe invocation, or the like, and if there are issues with the queue the administrator likely must learn our queue-on-disk format.

Being able to deliver to mailman even if mailman isn't currently running strikes me as a potential win for some configurations.

Most of the maildir phenomena you have an issue with wouldn't even arise in the use case under discussion; a mail would enter maildir/new , mailman would suck it out, and that would be that; renaming wouldn't occur and the number of elements in the queue is unlikely to become large enough to pressure filesystem indexing schemes.

~ethan fremen

Brad Knowles

5:45 p.m.

New subject: Incoming Queue format

At 12:23 PM -0400 9/29/06, emf wrote:

...

Furthermore, many MTAs *do* understand Maildir,

MTAs should be sticking to the job of transmitting e-mail between themselves. If they're spending any time mucking about with local mailbox formats, they're making a mistake. That's a job for a Local Delivery Agent, and not an MTA.

Now, granted many mail system packages also include one or more sample LDAs in the tarball, either as an official part of the system or in some sort of contrib/ directory somewhere, but let's make sure that we're using the proper terms for the proper objects.

Therefore, by definition, MTA != LDA.

...

                                             and most admins

do as well;

If this discussion has taught me anything, it's that after all these years we still have virtually no one in this business that really does understand Maildir or any other mailbox format, although many claim to do so.

...

         using our own queue-on-disk format means MTAs must
access Mailman via LTMP, pipe invocation, or the like, and if there are issues with the queue the administrator likely must learn our queue-on-disk format.

Our queue-on-disk format is already much simpler than Maildir, at least when it comes to directory structure, and the directory hashing schemes that I've been talking about have been around for many years. No new thought needs to be put into implementing them.

I even convinced Wietse that he should implement a lot of the same concepts, back when I first got involved in postfix in '98, when it was still being called VMailer.

Now, if you want to get outside of directory structure, our queue-on-disk format includes a lot of things that are Mailman-specific, such as creating message pickles, and I don't think that anyone is talking about getting rid of those aspects.

...

Most of the maildir phenomena you have an issue with wouldn't even arise in the use case under discussion; a mail would enter maildir/new , mailman would suck it out, and that would be that; renaming wouldn't occur and the number of elements in the queue is unlikely to become large enough to pressure filesystem indexing schemes.

You really need to go back and review exactly how messages are created using Maildir.

With Maildir, when a message comes in, a temporary file is opened with truncate (with certain measures taken to try to ensure that the selected filename will be unique), and if that system call succeeds, then the system appends the incoming message and renames the file, before it ever closes it.

If that creat() system call fails, then there is already a file by that name, and the LDA has to try again. This is how they "safely" create files on NFS, with an operation that is supposed to be atomic, and allows them to avoid file locking.

I'd have to check, but I think there are some more synchronous meta-data operations in here, too. Certainly, every time you look to see if more messages have come in, you have to scan the entire directory, and you have to stat() each and every file, and if you want to pick up the message and move that somewhere else, you're going to have to do further synchronous meta-data operations that involve locking the entire directory structure while they are taking place.

Now, your application may not see those locking, scanning, and stat operations, you may simply see "move this file to another location", but the underlying filesystem code has to do a lot of work to support that. And regardless of whether you're using locking or not, you still have race conditions that you have to code for -- or at least be aware of, because they are potential areas where you may have problems in the future.

If you don't read the comments that Mark Crispin has written about the weaknesses in Maildir, and you haven't read the code to see what it's actually doing, then I don't see how you can participate in this discussion.

-- Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."

 -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
 Assembly to the Governor, November 11, 1755

Founding Individual Sponsor of LOPSA. See <http://www.lopsa.org/>.

Jim

7:35 p.m.

New subject: Incoming Queue format

On Fri, 29 Sep 2006, Brad Knowles wrote:

...

If you don't read the comments that Mark Crispin has written about the weaknesses in Maildir, and you haven't read the code to see what it's actually doing, then I don't see how you can participate in this discussion.

I did read through both of the references you provided, more out of curiosity than any particular interest in whether or not Maildir is used to address the issues under discussion. To be honest, I don't see how either reference is particularly relevant. The Christenson paper is a nine year old architecture document that doesn't appear to make any mention of Maildir. At best it seems to imply that there are perhaps some better ways to solve some of the problems Maildir was designed to address. Crispin's comments appear to specifically address Maildir inefficiencies associated with management by an IMAP server. These inefficiencies are mostly (entirely?) unrelated to the delivery and one-time read scenario that is under discussion for Mailman (assuming I correctly understand the proposed feature). And to be fair, even the nature and severity of these Maildir inefficiencies is open to debate. See for example http://www.courier-mta.org/mbox-vs-maildir/#theend.

And since when is reading all source code for all programs, system calls, etc. associated with a Mailman function a requirement for making a comment or expressing an opinion on this list? If by chance you feel that you have such insight into the source code, share it (as you have) and leave it at that. Don't try to exclude people out of hand and scare off others who might have useful input. Among other things a list is a place for learning and gaining a deeper understanding of the issues associated with the topic.

Jim

Brad Knowles

6:24 p.m.

New subject: Incoming Queue format

At 12:23 PM -0400 9/29/06, emf wrote:

...

...
So your assumptions about what kinds of filesystems may or may not be appropriate are not necessarily going to coincide with the decisions that other people make, or the kinds of hardware and OS they may be forced to live with.

I don't disagree with this assertion, nor am I making assumptions about what people get to live with.

Actually, that is precisely what you did. You said that we shouldn't bother implementing something that XFS and ReiserFS would fix anyway, and that people who would be running Mailman would "obviously" choose to run the "best" filesystem for their application.

Your claims to the contrary are disingenuous, at best.

...

Most of the maildir phenomena you have an issue with wouldn't even arise in the use case under discussion;

We're talking about maildir-mailbox-per-list. So, all known issues with Maildir mailboxes would be applicable, because they would *be* Maildir mailboxes. The fact that we would be using Maildir mailboxes as a method of handling incoming messages instead of having them written to a 7th edition mbox-format mailbox, is purely an application-level implementation detail.

That said, I'm done arguing with you and Barry. You guys go do whatever you want.

-- Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."

 -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
 Assembly to the Governor, November 11, 1755

Founding Individual Sponsor of LOPSA. See <http://www.lopsa.org/>.

Carson Gaspar

10:37 p.m.

New subject: Incoming Queue format

--On Thursday, September 28, 2006 11:39 PM -0500 Brad Knowles <brad@stop.mail-abuse.org> wrote:

...

At 11:19 PM -0400 9/28/06, emf wrote:

...
It seems to me like anyone likely to end up with a huge enough incoming mailman queue to care about Maildir's inefficiencies would also be able to put a sensible filesystem underneath it.

That may simply not be possible. Moreover, I have some real operational problems with both XFS on Linux and ReiserFS, and I would not run a production mail system using them. Maybe IBM's JFS, if I were forced to run a production mail system on Linux at all, but certainly not XFS or ReiserFS.

Brad, if your _incoming_ queue is so big that you have to worry, your servers are woefully underspec'd. I understand your dislike for Maildir, but for the _incoming_ queue case, it just shouldn't matter. If you can provide a detailed use case where it matters for the _incoming_ queue, please do so.

-- Carson

Brad Knowles

1:52 a.m.

New subject: Incoming Queue format

At 3:37 PM -0700 9/29/06, Carson Gaspar wrote:

...

Brad, if your _incoming_ queue is so big that you have to worry, your servers are woefully underspec'd.

That may or may not be true, but that doesn't make the problem magically go away.

...

                                                      If you can
provide a detailed use case where it matters for the _incoming_ queue, please do so.

Basically, it's any site that has one or more lists that have relatively high levels of incoming traffic, and which run into synchronous meta-data bottleneck issues. This could be a lower-end machine with a filesystem that does not perform as well as could be, and a more moderately sized list. Or, this could be a site where they've already thrown the biggest/best configured machine at the problem that they can, and yet they're still seeing problems.

On the high end, from the reports we got after Apple upgraded the system for lists.apple.com, I don't think they're too likely to have these kinds of problems. But the FreeBSD folks might already be there, and I imagine that the SourceForge people are definitely there.

But I'm sure there are more relatively smaller lists running on relatively smaller/less well configured hardware, and which are running into the same kinds of problems.

All this aside, it's clear that I'm not going to convince anyone here of anything, so I'm just going to unsubscribe from the list and I'll ask everyone to make sure that they do not include me on any further messages on this subject.

-- Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."

 -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
 Assembly to the Governor, November 11, 1755

Founding Individual Sponsor of LOPSA. See <http://www.lopsa.org/>.

6713

Age (days ago)

6719

Last active (days ago)

List overview

Download

46 comments

12 participants

participants (12)

Barry Warsaw
Bob Puff＠NLE
Brad Knowles
Carson Gaspar
Dale Newfield
emf
Ian Eiloart
Jim
John A. Martin
John W. Baxter
Nigel Metheringham
Tokio Kikuchi

Re: [Mailman-Developers] [Mailman-checkins] SF.net SVN: mailman: [8041] trunk/mailman/Mailman

Tokio Kikuchi

Tokio Kikuchi

Tokio Kikuchi

Tokio Kikuchi

John A. Martin

John A. Martin

John W. Baxter

Jim

tags

participants (12)