[Mailman-Developers] LTMP for incoming mail
Brad Knowles
brad at stop.mail-abuse.org
Thu Sep 28 18:06:29 CEST 2006
At 10:09 AM -0400 9/28/06, Barry Warsaw wrote:
>> Does it have to be GPL? Is a Berkeley-type license not okay?
>
> GPL would be best, but Berkeley is probably okay. We'd probably
> want to get confirmation of that from the FSF. The key thing is
> that it has to be compatible with the GPL (and the Python
> Software LIcense -- see below) so that we can combine the whole
> kit and kaboodle.
Is there any license questions or issues that we would need to have
answered or confirmed by the Sendmail Consortium? Or should we wait
on that until we've heard back from the FSF?
>> Dunno about doing it in Python, but I will say that going
>> to Maildir as an additional queue-on-disk mechanism on top of
>> everything else we're already doing seems to be a big step
>> backward in terms of potential performance issues and I don't
>> really see any significant positive benefit.
>
> I don't think it's an additional queue-on-disk mechanism,
> certainly in comparison to what we're doing today.
Maildir was not designed as an efficient queue-on-disk strategy. It
was designed to allow multiple simultaneous parallel deliveries to
the NFS-mounted mailbox of a given user, and we know that it does a
number of additional unnecessary things that seriously hurt its
performance even in that relatively tightly defined context.
It does unnecessary file renames (which cause additional synchronous
meta-data filesystem operations), it uses filenames that are too long
and bust iname/inode caching schemes, and it doesn't make use of
obvious significant performance-enhancing mechanisms like directory
hashing.
It's pretty easy to design a mechanism that is much more efficient --
and scalable -- in handling multiple simultaneous deliveries to a
user mailbox on NFS.
So why would we want to abuse a bad scheme for user-mailbox-on-NFS as
an alternative scheme for queue-on-disk?
If we have queue-on-disk problems, why not solve them by implementing
a more efficient queue-on-disk scheme, instead of abusing a poorly
designed user-mailbox-on-NFS scheme?
> That way,
> you're not dumping all message destined for Mailman into
> one directory. Not as good as directory hashing, but
> better than what we have today.
That would be somewhat of an improvement in some respects, but
Maildir also brings along a lot of additional baggage and I'm not at
all convinced that it's worth the effort.
> I'll grant you that LMTP delivery has the potential to be
> the most efficient mechanism by which messages get from
> the MTA into Mailman. But it's certainly more work and
> more complicated than maildir; will you grant that maildir
> is better than what we have today? Think of it as a
> waystation on the road to the ultimate uber-performing
> list server. :)
I'm not at all convinced that Maildir would be an overall improvement
over what we have today. I think that adding a directory hashing
scheme on a fork()/exec() model would probably be a bigger
improvement than changing our inbox delivery mechanism from a
fork()/exec() model and using Maildir instead.
At least by sticking with fork()/exec() and adding a directory
hashing scheme on top of that, we wouldn't need to make any changes
to the way we interface with MTAs today -- all the changes could be
kept completely internal to Mailman. If we were to switch to Maildir
as an inbox delivery method, not only would we have to change the way
we interface with MTAs, we would also have to make internal changes
to Mailman to support the use of Maildir as our queue-on-disk
mechanism. That's a bigger overall change with bigger risk and
relatively lower potential payoff.
If we were to work on implementing a directory hashing scheme instead
of working on Maildir, we could still add LMTP at a later date.
That would allow us to go back at a later time and enhance our
features that we provide to Mailing list administrators, while also
giving us time to look more deeply into the potential performance
issues and make sure that we're not causing more problems than we're
solving.
> Let me just say that ideally, I think LMTP would be a
> great way to go. It's not my top priority though. I'm
> looking for ways to get more developers involved in the
> project, and this seems like a perfect thing for someone
> seeking Mailman fame and fortune <wink>.
I'm not convinced that this is an improvement.
> So, anyone care to take the challenge?
I'm not a developer, but I do have experience with building
large-scale mail and mailing list systems, and if you're willing to
listen to me then I'm willing to give you the benefit of my
experience.
IMO, Maildir is a Red Herring. The one and only reason to ever
consider using Maildir is if you're implementing a large-scale IMAP
mail server system and you're required to store user mailboxes on NFS.
Even then, you'd be well-served to look for better storage
mechanisms, because throwing potentially hundreds of thousands of
messages into a single directory is guaranteed to cause huge
performance issues, even if every single mailbox operation didn't
involve scanning the entire directory and doing a stat() on every
single file, locking the entire directory, creating/renaming/deleting
the file(s) as appropriate, and then unlocking the directory.
I think we're better off spending our resources working on trying to
resolve the real bottleneck issues that we already know are present
in our system as opposed to working on cool stuff that may or may not
help but would require more overall changes to more parts of the
system and with relatively lower potential payoff.
--
Brad Knowles, <brad at stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
Founding Individual Sponsor of LOPSA. See <http://www.lopsa.org/>.
More information about the Mailman-Developers
mailing list