Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 29 Oct 2003 13:11:01 -0800 Chuq Von Rospach <chuqui@plaidworks.com> wrote:
On Oct 29, 2003, at 1:05 PM, David Birnbaum wrote:
- third-party add-ons make it that much harder to install. If I have to set up a Mysql or Postgres database to use Mailman, it's a step that will put off people who don't already have it going.
actually, if you do it right, it's much easier -- because when you build in those tools, you build in standardized interfaces that third party add-ons can access, instead of the current case, which are code hacks that break every time Barry burps at the CVS server...
Aye, picking the right interface abstractions is key.
There's also a disjoint between the novice SysAdm case who loves the fact of Mailman's all-in-one service, and the more meaty chap who integrates what he needs to. Much of Mailman's appeal at the low end is its all-in-one simple-to-install nature. (Well, ignoring thee GID FAQ...)
Mailman v2.1 has a plugin layer for the membership roster. Its not a fully mature interface, but there are LDAP and SQL adaptors in the wild. At some point those adaptors will move into the Mailman core. If we move the archiving components (storage, presentation, index) behind plugin interfaces as well there's a reasonable opportunity for similar third parties to build adaptor layers which then also move into the Mailman core.
Oh yeah, and just to keep Nigel Metheringham hopping:
Mailman just doesn't have enough configuration options.
--
J C Lawrence
---------(*) Satan, oscillate my metallic sonatas.
claw@kanga.nu He lived as a devil, eh?
http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live.
On Wed, 2003-10-29 at 16:54, J C Lawrence wrote:
Aye, picking the right interface abstractions is key.
Right on.
There's also a disjoint between the novice SysAdm case who loves the fact of Mailman's all-in-one service, and the more meaty chap who integrates what he needs to. Much of Mailman's appeal at the low end is its all-in-one simple-to-install nature. (Well, ignoring thee GID FAQ...)
Yep, and I really really want Mailman 3 to take this concept farther. Some things that I think will help include, using Twisted to eliminate the /requirement/ of Apache integration and possibly the incoming mail server integration, as well as implement a bulk mailer to eliminate the need for an outgoing mail server. Ideally, it will still be possible to integrate with a Postfix for incoming and outgoing, but it shouldn't be necessary to get up and running.
Mailman v2.1 has a plugin layer for the membership roster. Its not a fully mature interface, but there are LDAP and SQL adaptors in the wild.
This interface was largely bolted on, so it's clumsy. Mailman 3 will be defined by interfaces from the start.
At some point those adaptors will move into the Mailman core. If we move the archiving components (storage, presentation, index) behind plugin interfaces as well there's a reasonable opportunity for similar third parties to build adaptor layers which then also move into the Mailman core.
Oh yeah, and just to keep Nigel Metheringham hopping:
Mailman just doesn't have enough configuration options.
Heh. That's another issue. I'm sure Mailman 3 will grow many more configuration options. The trick is making them manageable (and mostly ignorable -- i.e. the defaults Usually Work out of the box).
I've been experimenting with ideas for list styles which will make list admins lives easier I think, without reducing the flexibility for experts.
-Barry
At 11:01 PM -0500 2003/10/29, Barry Warsaw wrote:
Yep, and I really really want Mailman 3 to take this concept farther. Some things that I think will help include, using Twisted to eliminate the /requirement/ of Apache integration and possibly the incoming mail server integration, as well as implement a bulk mailer to eliminate the need for an outgoing mail server.
There, I have to disagree. Both the web server and the mail
server issues are complex enough that I don't believe it would be a good idea to try and re-invent this wheel. There are already enough bad web server and mail server implementations out there -- we don't need to make this situation worse.
There may be some mailing-list specific issues that we can (and
should) handle better inside mailman before we hand these things off to the other servers, but both Apache and postfix/sendmail/exim have enough experience and world-wide testing behind them to make it little else than folly resulting from hubris to try and replace them.
There's just no substitute for having hundreds of millions of
people world-wide pounding on these things day-in and day-out 365 days a year.
Components like this should be scheduled for replacement if, and
only if, you can demonstrate beyond a reasonable doubt that there are inherent problems that are insurmountable otherwise, and there is no feasible alternative.
You don't just take a Tom Mix pocket knife and cut open your own
chest and remove your heart, to replace it with a mechanical pump that you designed yourself out of a tin can, a turkey baster, some bailing wire, and some garden hose.
If you absolutely require a heart transplant and there are no
human alternatives, you get a world-respected heart surgeon to perform the operation using the latest techniques and the Jaarvik 9 (or whatever). And then you get everyone in your family, all your friends, all your neighbors, all your church members, and hopefully all religious people world-wide to pray for you.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
On Oct 29, 2003, at 8:26 PM, Brad Knowles wrote:
There may be some mailing-list specific issues that we can (and should) handle better inside mailman before we hand these things off to the other servers, but both Apache and postfix/sendmail/exim have enough experience and world-wide testing behind them to make it little else than folly resulting from hubris to try and replace them.
+1
I've experimented with direct-out-the-pipe delivery systems. Trust me, you don't want to go there. It's not trivial. Well, it's trivial for 90% of the world that follows the RFCs and behaves as expected and has the right DNS setups and isn't trying to outsmart spammers by being stupid. and you'll spend the other 90% of your time trying to build compatibility in with the other 10%.
On Wed, 2003-10-29 at 23:26, Brad Knowles wrote:
There, I have to disagree. Both the web server and the mail server issues are complex enough that I don't believe it would be a good idea to try and re-invent this wheel. There are already enough bad web server and mail server implementations out there -- we don't need to make this situation worse.
Let's not discount the integration problems, which are a huge headache for newbies. I'm fairly certain that Twisted is the right approach for surfacing the web u/i to Mailman. The requirements are not overwhelming and fronting Mailman's u/i with Apache really doesn't buy us that much. We all agree that CGI sucks, and we could make that better with mod_python or some other such glue, but why go to the trouble?
Relying on Twisted for the incoming mail protocols is something I'm less certain about, although there is a lot of appeal to this approach. We could throw lots smarts into a Python port-25 listener, including global spam fighting and bounce processing. An approach like Exim + elspy affords some really cool possibilities. A bigger negative is that there's less precedence for proxying smtpd as there is for httpd, so it's harder to fit Mailman into the mix with an existing mail server.
-Barry
On Wed, 2003-10-29 at 23:36, Chuq Von Rospach wrote:
I've experimented with direct-out-the-pipe delivery systems. Trust me, you don't want to go there. It's not trivial. Well, it's trivial for 90% of the world that follows the RFCs and behaves as expected and has the right DNS setups and isn't trying to outsmart spammers by being stupid. and you'll spend the other 90% of your time trying to build compatibility in with the other 10%.
Chuq, do you think it would be feasible for Mailman to try to handle that 90% itself, and then only hand-off to a Real MTA when it runs into trouble with the other 10% -- assuming it could know when it runs into trouble.
Also, there's incoming SMTP and outgoing SMTP. It may be possible to build in support for one direction without providing the other. (It also may not be worth it.)
-Barry
At 8:36 PM -0800 2003/10/29, Chuq Von Rospach wrote:
I've experimented with direct-out-the-pipe delivery systems. Trust me, you don't want to go there. It's not trivial. Well, it's trivial for 90% of the world that follows the RFCs and behaves as expected and has the right DNS setups and isn't trying to outsmart spammers by being stupid. and you'll spend the other 90% of your time trying to build compatibility in with the other 10%.
You'd have the same sorts of problems if you added your own
interpretation of the MIME bodyparts and stored the attachments separately, and then tried to re-integrate everything on transmission.
Indeed, you'd have a whole host of additional problems you'd add
because not only would you be trying to format everything on output so that everyone would like what you send, you'd be assuming that you can always correctly parse your inputs and correctly handle the results.
IMO, you're much better just storing exactly what you got, and
then sending exactly what you stored when the time comes. Any misunderstandings are therefore the fault of the sender or recipient, and not the result of anything you added to the complexity mix.
Or have I missed something here?
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
On Oct 29, 2003, at 8:53 PM, Barry Warsaw wrote:
Chuq, do you think it would be feasible for Mailman to try to handle that 90% itself, and then only hand-off to a Real MTA when it runs into trouble with the other 10% -- assuming it could know when it runs into trouble.
I think you have enough on your plate to not re-invent what others have already done pretty well. When you run out of features to implement, then think about this. Not until.
At 11:53 PM -0500 2003/10/29, Barry Warsaw wrote:
Chuq, do you think it would be feasible for Mailman to try to handle that 90% itself, and then only hand-off to a Real MTA when it runs into trouble with the other 10% -- assuming it could know when it runs into trouble.
Bryan Costales and Eric Allman had this debate at
InfoBeat/Mercury Mail. Bryan said that he could write a better "simple" MTA that could handle the easy 80% and leave the hard 20% to sendmail. Eric showed that he could improve sendmail to the point where it would perform at or near the level of performance of Bryan's code without throwing everything out, and would out-perform every other aspect of the system in question (so that the MTA was no longer the bottleneck at any stage).
I'm confident that the same sort of approach is appropriate for
other well-respected MTAs (e.g., postfix, and exim in my personal experience).
Also, there's incoming SMTP and outgoing SMTP. It may be possible to build in support for one direction without providing the other. (It also may not be worth it.)
It's hard enough writing an incoming SMTP handler, and doing it
right. Many large service providers have seriously screwed up when trying to do so (bigfoot anyone?), and others have only implemented half of the inbound solution (AOL), leaving the harder parts to standard programs like sendmail.
Even then I argued violently against this approach at AOL, and
felt that we could do a better job by leaving all the external interfacing/queueing issues to sendmail, and instead make the in-house developed code an LMTP Local Delivery Agent. I was over-ruled, primarily because we had already gone too far down the road that had been chosen for us. Note that none of the original Internet Mail Operations team members are left at AOL (almost all bugged out when the new mail server software came online), and I don't think any of the original Internet Mail Development team members are left, either.
Bad Juju, Bwana.
I've been down this road before. Trust me, you don't want to do this.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
On Oct 29, 2003, at 9:08 PM, Brad Knowles wrote:
Bryan Costales and Eric Allman had this debate at InfoBeat/Mercury Mail. Bryan said that he could write a better "simple" MTA that could handle the easy 80% and leave the hard 20% to sendmail.
There is no such thing as a simple MTA. This gets hairy quickly. Really quickly.
you are much better off spending money on a good fast disk RAID (since the chances that you'll win the lottery are on par with the chances that your bottleneck is NOT disk I/O in mail sending) than on a programmer to try to build fast MTAs.
that none of the original Internet Mail Operations team members are left at AOL (almost all bugged out when the new mail server software came online), and I don't think any of the original Internet Mail Development team members are left, either.
And boy, does it show.
At 9:16 PM -0800 2003/10/29, Chuq Von Rospach wrote:
There is no such thing as a simple MTA. This gets hairy quickly. Really quickly.
Bryan is one of the few people I would expect to be able to do
something that could actually handle the easy 80%. Writing the book _sendmail_ (now in its fourth edition) is just one of his many talents.
you are much better off spending money on a good fast disk RAID (since the chances that you'll win the lottery are on par with the chances that your bottleneck is NOT disk I/O in mail sending) than on a programmer to try to build fast MTAs.
They were already using pure RAM disks for this application.
Disk I/O was not the problem.
Bryan and Eric were two major contributors to my invited talks
"Sendmail Performance Tuning for Large Systems" (see <http://www.shub-internet.org/brad/papers/sendmail-tuning/>) and "Design and Implementation of Highly Scalable E-mail Systems" (see <http://www.shub-internet.org/brad/papers/dihses/>). These guys are not lightweights in this field.
And boy, does it show.
Indeed.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
Ok, I'm beat up enough, so let me open things up to a hopefully more productive thread. How can Mailman more efficiently hand off messages to a local mail server for final delivery?
Some problems with the current approach include:
The desire/requirement that Mailman chunk and sort recipients
The ability for Mailman to swamp the mail server or cause the mail server to consume all available cpu
The fact that failures in upstream mail server are reported to Mailman as bounces instead of as error codes
Inefficiencies in VERP/personalization/mail-merge because of the lack of cooperation
The need for Mailman to queue outgoing messages that aren't completely delivered
I'm sure you guys can identify more issues <wink>. Look at the complexity in SMTPDirect.py, and even there, we still have problems.
So how do we design a system where we can push the complexity and efficiency concerns out past our boundary? Here's a rough sketch of what I'd like:
Mailman has a list of recipients, or at least knows how to calculate that list. It has a message template as encoded 7-bit ascii. It has a dictionary (association table, hash table) of substitution placeholders to values for each recipient, or knows how to calculate that.
Mailman wants to simply hand that data off to some agent and forget about it. It wants to know that the agent will make best effort to mail merge and deliver. It wants to be informed of any final delivery failures. And that's it. Mailman doesn't want to chunkify recipients, and it doesn't want to sort them. It doesn't want to worry about a mail server effectively managing system resources. I'd rather not have to hand it a couple of meg of recipient or substitution data, but there seems to be no other way.
So what can we do here to improve matters?
-Barry
At 9:53 AM -0500 2003/10/30, Barry Warsaw wrote:
I'm sure you guys can identify more issues <wink>. Look at the complexity in SMTPDirect.py, and even there, we still have problems.
I'm not a programmer, so I can't really help you there. ;-(
So how do we design a system where we can push the complexity and efficiency concerns out past our boundary?
I can say that I think we need to look at all of the
recommendations in the following papers:
"Tuning Sendmail for Large Mailing Lists"
Rob Kolstad
Proceedings of LISA '97
http://tinyurl.com/t09c
"Drinking from the Fire(walls) Hose:
Another Approach to Very Large Mailing Lists"
Strata Rose Chalup, Christine Hogan, Greg Kulosa, Bryan McDonald,
and Bryan Stansell
Proceedings of LISA '98
http://tinyurl.com/t09k
There may be others that we need to look at, but of which I am
not (yet) aware. If anyone knows of any, please let me know.
We're already doing some of the things recommended in these
papers, but not everything. And I think there may be a couple more things we can do that are not mentioned, but which would be a further help.
However, if you want to hand all this work to an external "final
mail-merge delivery agent", this is moot. We just need to make sure that the selected FMMDA addresses all these issues. We could use an existing tool (e.g., bulk_mailer from <ftp://cs.utk.edu/pub/moore/bulk_mailer/>), or we could create a separate package to address this issue (of course, that brings the ball back into our court).
Or, you could just have Chuq solve this problem for you, as he
mentioned in <http://mail.python.org/pipermail/mailman-developers/2000-May/006820.html>. ;-)
So what can we do here to improve matters?
Sounds to me like you want to externalize this whole process.
Problem is, bulk_mailer is the only tool I know of that currently exists as a partial attempt to address this problem, although perhaps some additional work on it could fill in the rest. Alternatively, you develop, or work with someone else to develop, an alternative to bulk_mailer that does all the things you want and which can be used as an external tool.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
On Oct 30, 2003, at 7:48 AM, Brad Knowles wrote:
"Tuning Sendmail for Large Mailing Lists" http://tinyurl.com/t09c
400K/day aggregate max
"Drinking from the Fire(walls) Hose: http://tinyurl.com/t09k
380K/day aggregate max
(yawn. My server's bored. snicker)
but seriously, both of them are built around pre sendmail 8.12
environments. there's some interesting stuff there, but it's now fairly
dated, since sendmail 8.12 really changes the landscape. And all of
those other environments....
Or, you could just have Chuq solve this problem for you, as he
mentioned in
<http://mail.python.org/pipermail/mailman-developers/2000-May/ 006820.html>. ;-)
gack.
So what can we do here to improve matters?
Sounds to me like you want to externalize this whole process. Problem
is, bulk_mailer is the only tool
Because pretty much every MLM has internalized the process. By the end
of november, I'll have completely retired any use of bulk_mailer on my
systems for other solutions.
One big reason: increasing spam blocking (stupid or otherwise) of
non-individually addressed email. The old list server setup of:
to: subscribers of list <list@foo> bcc: bulk_drop@of.subscribes
is increasingly risky as far as delivery is concerned. I also don't
think it allows for the kind of personalization that's needed for your
general audiences (help URLs, unsub URls, etc).
And with sendmail 8.12, queue groups and envelope splitting, frankly,
bulk_mailer does more harm to the delivery stream than good. Just stuff
it into sendmail, tune sendmail to split intelligently. bulk_mailer is
obsolete... and much to my amusement, a few sites block based on its
use in headers (idiots), which is why my copy identifies itself as
ulkbay_ailermay.
At 8:41 AM -0800 2003/10/30, Chuq Von Rospach wrote:
(yawn. My server's bored. snicker)
Understood, but the techniques they recommend are still valid.
but seriously, both of them are built around pre sendmail 8.12 environments.
True.
there's some interesting stuff there, but it's now
fairly dated, since sendmail 8.12 really changes the landscape. And all of those other environments....
There are still some things that even sendmail 8.12, postfix,
etc... do not do.
One of them is recipient sorting by average delivery time over
the past week (probably want a decaying geometric mean), which would require tracking log data on a per-recipient basis.
Another is two-level message handling, by configuring the MTA for
the initial delivery attempt to use very low timeouts, but then to fall back to a secondary MTA (or MTA pool) that uses more standard timeouts for those sites that are slower.
I'm sure there are others.
Because pretty much every MLM has internalized the process.
Indeed. So, is Barry going the right way by trying to
externalize this, or should the internal methods be beefed up so that they more fully address the issues in question?
And with sendmail 8.12, queue groups and envelope splitting, frankly, bulk_mailer does more harm to the delivery stream than good. Just stuff it into sendmail, tune sendmail to split intelligently. bulk_mailer is obsolete...
Perhaps in its current form, that is true. However, not all
sites are using sendmail 8.12, and of the ones that are, most are probably not using it in a manner that is more suitable for mailing lists.
So, this kind of tool does still have it's uses at most sites,
and it could certainly be extended to address the issues that even the most modern MTAs do not (yet) attempt to handle.
However, given the issues you've mentioned, it would probably be
a good idea to be able to turn off selected "bulk_mailer" type features, so that you can let the MTA do more of it's job better -- if it is configured to do so.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
participants (4)
-
Barry Warsaw
-
Brad Knowles
-
Chuq Von Rospach
-
J C Lawrence