Adding headers to mailman generated mails
I have a problem that might affect lots of people who send emails to large lists. AOL requires an additional header in the emails identifying the recipient for the feedback loops to unsubscribe AOL users who complain. So how do I add an additional header with a field?
Thanks for all the help!
On Tue, 2004-01-20 at 15:30, Somuchfun wrote:
I have a problem that might affect lots of people who send emails to large lists. AOL requires an additional header in the emails identifying the recipient for the feedback loops to unsubscribe AOL users who complain. So how do I add an additional header with a field?
Can you describe this additional header in more detail? Better yet would be some documentation on the web describing this header. I.e. if it's a standard (or maybe even a proposed standard), perhaps Mailman should support it out of the box.
It's actually pretty easy to add such a header. One option is to just include the header in the original message. Or you can get Mailman to add it to every message by modifying the code in the pipeline. The place to start would be Mailman/Handlers/CookHeaders.py. That's where the RFC 2369 headers are added for example.
-Barry
Hello Barry, Let me try to explain why the additional header is so important. When you use large lists with lots of traffic to AOL they can set you on something that is called an "feedback loop". This loop creates automated emails from AOL's postmaster about people on one of your list (as an ISP) who have clicked the "spam" button in regards to one of the messages originating from you. But due to privacy reasons the recipient of the email will be stripped out of the feedback loop message. So AOL advises to use secret headers to identify who the person who complained is. This is crucial because it only takes 1 complaint per 1000 recipients for AOL to put an automated temporary block on your ip address that results in bouncing of all traffic. Of course I could just add a mail merge code in the footer of the message but that only seems to work with full VERP enabled in mailman and the slowdown is so dramatic that it is no longer feasible for a list of 50,000 or more. So what I would like to see are two things:
- One make the codes like %(user_delivered_to)s in the footer work without VERP enabled
- Have the option in the GUI to add headers and use for example %(user_delivered_to)s in it
Kind of connected to this is the problem that the GUI does not allow to show all suspended members of a list and on a list with 50,000 members it makes no sense to go through them page by page to find out who got suspended because of a bounce.
And then an additional problem is that mailman does not take out x-AuthenticatedSender headers from the poster of the message. And this header added by auth smtp reveals very clearly who the sender is even when the list is set to anonymous posting!
I hope this helps make mailman better and stronger!
-----Original Message----- From: Barry Warsaw [mailto:barry@python.org] Sent: Tuesday, January 20, 2004 2:28 PM To: Somuchfun Cc: mailman-developers@python.org Subject: Re: [Mailman-Developers] Adding headers to mailman generated mails
I have a problem that might affect lots of people who send emails to large lists. AOL requires an additional header in the emails identifying
On Tue, 2004-01-20 at 15:30, Somuchfun wrote: the recipient
for the feedback loops to unsubscribe AOL users who complain. So how do I add an additional header with a field?
Can you describe this additional header in more detail? Better yet would be some documentation on the web describing this header. I.e. if it's a standard (or maybe even a proposed standard), perhaps Mailman should support it out of the box.
It's actually pretty easy to add such a header. One option is to just include the header in the original message. Or you can get Mailman to add it to every message by modifying the code in the pipeline. The place to start would be Mailman/Handlers/CookHeaders.py. That's where the RFC 2369 headers are added for example.
-Barry
At 2:56 PM -0800 2004/01/21, Somuchfun wrote:
Of course I could just add a mail merge code in the footer of the message but that only seems to work with full VERP enabled in mailman and the slowdown is so dramatic that it is no longer feasible for a list of 50,000 or more.
If you're not using VERP and you need per-recipient data in the
headers, then there is absolutely nothing that mailman can do to help you. Mailman will pass the message to the MTA in chunks of 50 or 100 (or whatever you specify), and you could not encode all those recipient names in the headers without exposing a great deal of privacy information about your recipients.
Moreover, once the message was delivered to the user's mailbox,
just by looking at the message and header contents there would be no way to distinguish between any of the 50 or 100 recipients.
You could configure your MTA to add per-recipient information
after splitting incoming envelopes so that it delivers a separate message for each, but this would be as bad as enabling VERP (for the same reasons) and would not give you the benefit of managing bounces much more easily, etc....
So what I would like to see are two things:
- One make the codes like %(user_delivered_to)s in the footer work without VERP enabled
No can do. When you have 50 recipients, which one should have
their name inserted into this field?
- Have the option in the GUI to add headers and use for example %(user_delivered_to)s in it
Again, not possible. Mailman doesn't have the control at that
point -- the MTA does. And if you want to keep your network traffic to a reasonable level and keep the MTA from beating the hell out of your disk drives as it delivers each copy of the message, then there's not much you can do.
I don't understand how AOL expects people to accomplish this sort
of thing (and I used to be their Sr. Internet Mail Systems Administrator). Maybe I need to talk to Carl Hutzler.
And then an additional problem is that mailman does not take out x-AuthenticatedSender headers from the poster of the message. And this header added by auth smtp reveals very clearly who the sender is even when the list is set to anonymous posting!
Mailman has taken a pretty strong stance towards not munging the
message any more than absolutely necessary. Message body content may be filtered or converted, but in particular the headers are considered sacrosanct and will not be touched. This same approach can be found in all major MTAs that I know of.
If you want to configure your MTA to strip certain headers, that
should be possible, and you should have the option of doing that. But I don't think you should be expecting Mailman to do this job for you.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
Brad, While I understand part of your rationale this is really creating a big problem for people running large lists and try to handle their email traffic in an ethical way. Is there a problem to send out the emails one by one with individual To: addresses and then add a header or a mail merge field in the footer without creating bounce addresses that most MTA do not allow or understand? I do not mind the additional CPU time for sending out the messages one by one if it solves the problem for right now. But since the VERP bounce back addresses do not really exist an Exim with verify = recipient will always have a problem. So is there a way to tweak Exim into sending the messages individually and allow the addition of a personalized footer without creating personalized bounce-back addresses?
Thanks!
-----Original Message----- From: Brad Knowles [mailto:brad.knowles@skynet.be] Sent: Wednesday, January 21, 2004 4:24 PM To: Somuchfun Cc: mailman-developers@python.org; 'Barry Warsaw' Subject: RE: [Mailman-Developers] Adding headers to mailman generated mails
At 2:56 PM -0800 2004/01/21, Somuchfun wrote:
Of course I could just add a mail merge code in the footer of the message but that only seems to work with full VERP enabled in mailman and the slowdown is so dramatic that it is no longer feasible for a list of 50,000 or more.
If you're not using VERP and you need per-recipient data in the headers, then there is absolutely nothing that mailman can do to help you. Mailman will pass the message to the MTA in chunks of 50 or 100 (or whatever you specify), and you could not encode all those recipient names in the headers without exposing a great deal of privacy information about your recipients.
Moreover, once the message was delivered to the user's mailbox, just by looking at the message and header contents there would be no way to distinguish between any of the 50 or 100 recipients.
You could configure your MTA to add per-recipient information after splitting incoming envelopes so that it delivers a separate message for each, but this would be as bad as enabling VERP (for the same reasons) and would not give you the benefit of managing bounces much more easily, etc....
So what I would like to see are two things: footer work without
- One make the codes like %(user_delivered_to)s in the
VERP enabled
No can do. When you have 50 recipients, which one should have their name inserted into this field?
- Have the option in the GUI to add headers and use for example %(user_delivered_to)s in it
Again, not possible. Mailman doesn't have the control at that point -- the MTA does. And if you want to keep your network traffic to a reasonable level and keep the MTA from beating the hell out of your disk drives as it delivers each copy of the message, then there's not much you can do.
I don't understand how AOL expects people to accomplish this sort of thing (and I used to be their Sr. Internet Mail Systems Administrator). Maybe I need to talk to Carl Hutzler.
And then an additional problem is that mailman does not take out x-AuthenticatedSender headers from the poster of the message. And this header added by auth smtp reveals very clearly who the sender is even when the list is set to anonymous posting!
Mailman has taken a pretty strong stance towards not munging the message any more than absolutely necessary. Message body content may be filtered or converted, but in particular the headers are considered sacrosanct and will not be touched. This same approach can be found in all major MTAs that I know of.
If you want to configure your MTA to strip certain headers, that should be possible, and you should have the option of doing that. But I don't think you should be expecting Mailman to do this job for you.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
At 4:46 PM -0800 2004/01/21, Somuchfun wrote:
Is there a problem to send out the emails one by one with individual To: addresses and then add a header or a mail merge field in the footer without creating bounce addresses that most MTA do not allow or understand?
No MTA ever created should ever have a problem with VERPs. Users
might be confused by how they look, but the mail server should be able to deal with them just fine. That's the entire point of VERPs.
That said, if you don't want to do VERPs but you do want a single
recipient per message generated by Mailman, you should just need to adjust the maximum number of recipients per message as specified by SMTP_MAX_RCPTS, and enable message personalization (see <http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq02.002.htp> and <http://www.python.org/cgi-bin/faqw-mm.py?query=RCPT&querytype=simple&casefold=yes&req=search>). Just make sure that you don't enable actual VERP'ing along with these other changes.
As part of the message personalization, add the appropriate
per-user information in the template footer for the list. That should hopefully deal with the problem.
I do not mind the additional CPU time for sending out the messages one by one if it solves the problem for right now. But since the VERP bounce back addresses do not really exist an Exim with verify = recipient will always have a problem.
This is almost certainly never a CPU time issue. It's a disk I/O
capacity issue. See <http://mail.python.org/pipermail/mailman-developers/2001-June/008928.html> and the related "performance" entries in the FAQ.
Not only can you kill your disk drive(s) on the server by
specifying too few recipients per message, you can seriously reduce your message throughput and totally backlog the machine. If you have a large enough list, you can easily make the machine bury itself to the point where it can not ever possibly recover from what was done to it.
So is there a way to tweak Exim into sending the messages individually and allow the addition of a personalized footer without creating personalized bounce-back addresses?
I am not familiar with Exim. I do not know what configuration
changes would be required to get it to add personalized information in the headers. You would need to talk to someone better acquainted with Exim, presumably on an Exim-specific mailing list or newsgroup.
I don't remember whether or not postfix has a facility that would
allow you to have it perform per-message/recipient header modifications. I'd have to check the latest documentation, logs, etc....
I can tell you that the default standard installation of sendmail
will do this for you, automatically. If there is one and only one recipient of a message, the "$u" macro will be defined, and the identified username will be shown in the "Received:" headers. If there is more than one recipient for the message, then this macro will not be defined, and no usernames will be displayed.
Unfortunately, most headers tend to be missing from most
complaints, so your best bet would probably be to get Mailman to put the message personalization information into the footer of the message, which is more likely to survive.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
On Wed, 2004-01-21 at 21:21, Brad Knowles wrote:
As part of the message personalization, add the appropriate per-user information in the template footer for the list. That should hopefully deal with the problem.
I should mention that Mailman 2.1's full personalization support (as opposed to VERP header support) isn't terribly efficient. I have what I think will be a very nice scheme to do this about as fast as you can do in Python, but it requires Python 2.3 so it's slated for Mailman 3. The nice upside is that you could conceivably support the templatizing and personalization of any content, not limited to the footer, header, or mail headers. I believe I can make it efficiently configurable so a list owner could disable content personalization of say the original message for discussion lists, but enable it for newsletters. We'll ignore for now the question of where the personalized content comes from.
In a previous message, Brad gave great answers and links that are well worth re-reading every few months. So I won't rehash anything I agree with.
So is there a way to tweak Exim into sending the messages individually and allow the addition of a personalized footer without creating personalized bounce-back addresses?
I am not familiar with Exim. I do not know what configuration changes would be required to get it to add personalized information in the headers. You would need to talk to someone better acquainted with Exim, presumably on an Exim-specific mailing list or newsgroup.
Exim has some very nice capabilities which can be used to embed an interpreter like Python in your MTA. For example, we use this on incoming messages on python.org to filter everything throw spambayes and do other programmatically interesting checks on the message. Yes, it slows down message acceptance a bit, but it's worth it for us.
Nigel can provide details, but I think the same embedding feature could be used to have the MTA do the final stitching of content template and personalized data. It would be A Project to hack together, but I think it could be a neat idea to play with, although I'm not sure how much it would help. Certainly, pushing the stitching down into the MTA and closer to the external socket connection would reduce disk i/o on the mail server, because then Mailman could go back to handing one copy of the message to the MTA, plus some job description of where to get the personalized information. Imagine a SQL select statement for instance. If the MTA could do what Mailman does here -- not creating a disk image for each instance of the message, but stitching it together in member as it's going out on the wire -- I think you'd greatly improve disk contention. You wouldn't help bandwidth, but then if JC's evaluation is accurate, that penalty is a mere <wink> doubling of bandwidth.
I don't think Postfix has the same embedding capabilities, although I haven't looked at what Postfix 2.1 may provide.
I can tell you that the default standard installation of sendmail will do this for you, automatically. If there is one and only one recipient of a message, the "$u" macro will be defined, and the identified username will be shown in the "Received:" headers. If there is more than one recipient for the message, then this macro will not be defined, and no usernames will be displayed.
In a sense, that's what we've talked about before. If there were a standard language that the mail server and list manager could agree on for both defining the template, and defining the per-recipient data source, we could have a more efficient mechanism, with perhaps a hope of mta agnosticism.
Unfortunately, most headers tend to be missing from most complaints, so your best bet would probably be to get Mailman to put the message personalization information into the footer of the message, which is more likely to survive.
As for stripping headers, I do think there's some value in being able to more easily configure the headers to strip for both regular messages and anonymized messages. OTOH, it's easy to hack the source. Cleanse.py in Mailman/Handlers is the place to look.
-Barry
At 5:50 PM -0500 2004/01/22, Barry Warsaw wrote:
Nigel can provide details, but I think the same embedding feature could be used to have the MTA do the final stitching of content template and personalized data. It would be A Project to hack together, but I think it could be a neat idea to play with, although I'm not sure how much it would help.
This is similar to what Eric Allman (at that time, before
Sendmail Inc. existed), Bryan Costales (at the time, working for InfoBeat/Mercury Mail), and I (working at AOL) were discussing back in 1996, in the creation of a Mail-Merge Transport Protocol (MMTP) server, based on a modified version of sendmail along with a standard language for transmitting that content. With MMTP servers on both ends, it would not matter how many thousands or millions of recipients you might have, only one copy of the message body would be transmitted, and all the rest would be filled in on the remote end.
We ultimately gave up on this idea because we realized that it
would make the spam problem much, much worse. The same things that help regular MTAs transmit millions of customized messages per hour to their paying customers would probably allow spammers to transmit billions of messages per hour to everyone in the universe.
Certainly, before any serious discussion of creating something
like an MMTP server, and trying to make that a standard which you would expect programs like sendmail, postfix, and Exim to implement, I believe that the spam issue needs to be addressed. You need to be able to prove how this cannot be abused to generate spam instead.
If the MTA could do what Mailman does here -- not creating a disk image for each instance of the message, but stitching it together in member as it's going out on the wire -- I think you'd greatly improve disk contention.
I'm not sure that the MTA could safely do that in memory. At
least, it would be difficult to ensure that the MTA gets this done right. This would be akin to handling the entire message queue in memory for all messages, something which can't really be done safely except under very strict circumstances.
The only MTA I know of that is capable of doing things like this
is the latest release of version 8 sendmail, and even then it defaults to handling messages in memory that are no larger than 4KB.
Yes, filesystem I/O is the number one killer, specifically
synchronous meta-data updates.
But then people on this list have heard me harping on this
subject for a long time, and should know by now that I will refer them to Nick Christenson's book _Sendmail Performance Tuning_ (see <http://www.jetcafe.org/~npc/book/sendmail/>), or my own slides from an invited talk entitled "Sendmail Performance Tuning for Large Systems" at <http://www.shub-internet.org/brad/papers/sendmail-tuning/>.
I don't think Postfix has the same embedding capabilities, although I haven't looked at what Postfix 2.1 may provide.
I'm not aware of anything like this, but I'd have to check.
In a sense, that's what we've talked about before. If there were a standard language that the mail server and list manager could agree on for both defining the template, and defining the per-recipient data source, we could have a more efficient mechanism, with perhaps a hope of mta agnosticism.
That would be nice. However, I fear that we have much more basic
problems that are much more serious, and which need to be resolved before we can expect to start worrying about such subjects as increasing efficiency in the interfaces between MLMs and MTAs.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
On Thu, 2004-01-22 at 18:21, Brad Knowles wrote:
This is similar to what Eric Allman (at that time, before Sendmail Inc. existed), Bryan Costales (at the time, working for InfoBeat/Mercury Mail), and I (working at AOL) were discussing back in 1996, in the creation of a Mail-Merge Transport Protocol (MMTP) server, based on a modified version of sendmail along with a standard language for transmitting that content. With MMTP servers on both ends, it would not matter how many thousands or millions of recipients you might have, only one copy of the message body would be transmitted, and all the rest would be filled in on the remote end.
We ultimately gave up on this idea because we realized that it would make the spam problem much, much worse. The same things that help regular MTAs transmit millions of customized messages per hour to their paying customers would probably allow spammers to transmit billions of messages per hour to everyone in the universe.
Very interesting. Some thoughts: there would still be some benefit here for Mailman if we simply limited access to the MMTP to the localhost interface. That's how Mailman hands stuff off to the MTA, and in our Exim configuration, we do quite a bit of special casing for localhost connections. The advantage here is that Mailman could go back to batching deliveries to its worker mail server, reducing both the bandwidth between the two processes and the disk i/o on the worker mta.
(Read 'localhost' as privileged connection, e.g. Mailman feeding a smurf farm.)
I had been thinking along the lines of the language for specifying the data source as a db connection and a SQL command. I wouldn't want to do that across the Internet! OTOH, with a protocol like MMTP, I suppose you'd have to send all the data for all the recipients in the same transaction, and the bandwidth trade-off would depend on the size of the recipient-centric data.
Certainly, before any serious discussion of creating something like an MMTP server, and trying to make that a standard which you would expect programs like sendmail, postfix, and Exim to implement, I believe that the spam issue needs to be addressed. You need to be able to prove how this cannot be abused to generate spam instead.
That's certainly tricky, but I think it's got to boil down to privilege or authentication. It would still make me nervous to accept such jobs from other than sites I control.
As an aside: the spam issue is already a huge nightmare for list servers. For example, every once in a while we get spamcop reports targeting python.org. Why is that? Well, we filter all email destined to our lists through various levels of spam defenses, but crap does slip through. And then /we/ get flagged as the originator of the spam. That's just one issue related to spam we have to deal with.
If the MTA could do what Mailman does here -- not creating a disk image for each instance of the message, but stitching it together in member as it's going out on the wire -- I think you'd greatly improve disk contention.
I'm not sure that the MTA could safely do that in memory. At least, it would be difficult to ensure that the MTA gets this done right. This would be akin to handling the entire message queue in memory for all messages, something which can't really be done safely except under very strict circumstances.
What I was thinking was something along the lines of storing the template and the 'job description', a concise definition of how to get the recipient-centric data. The jobs would have to be small enough so that they could be reliably dequeued, stitched and sent while still making the guarantees an MTA has to make. I'm just hand-waving here of course, and the rest is left as a simple matter of engineering <wink>.
In a sense, that's what we've talked about before. If there were a standard language that the mail server and list manager could agree on for both defining the template, and defining the per-recipient data source, we could have a more efficient mechanism, with perhaps a hope of mta agnosticism.
That would be nice. However, I fear that we have much more basic problems that are much more serious, and which need to be resolved before we can expect to start worrying about such subjects as increasing efficiency in the interfaces between MLMs and MTAs.
I should mention that I'm specifically interested in increasing the efficiency between Mailman and its local worker MTAs. These are all systems under my control so I should be able to tune them, set up privileges, common data source access, etc. to make things work as smoothly as possible until the message hits the external outgoing interface. After that, we have to play nice and standard.
I'm not even touching the 3rd rail of putting the MTA /in/ Mailman any more :).
-Barry
At 6:56 PM -0500 2004/01/22, Barry Warsaw wrote:
I had been thinking along the lines of the language for specifying the data source as a db connection and a SQL command.
We were working on a method to send a specially MIME-formatted
message to identify the various potential bodyparts, a list of recipients and who gets which bodyparts, then a language for specifying the template which pulls in the appropriate bodyparts for the appropriate recipients (and ways to insert bits of information about the user themselves into the message).
That's certainly tricky, but I think it's got to boil down to privilege or authentication. It would still make me nervous to accept such jobs from other than sites I control.
Yup.
I'm just hand-waving here of
course, and the rest is left as a simple matter of engineering <wink>.
Yeah. Hand-waving. Right.
I should mention that I'm specifically interested in increasing the efficiency between Mailman and its local worker MTAs.
The problem is that this is a very small piece of the overall
puzzle, and you're talking about a lot of things that others might also want in other areas. MTA authors are going to be primarily concerned about overall optimization of the MTA in general, and certain employers of certain authors might be concerned about things like WAN efficiency over other things.
I can tell you that with some relatively minor modifications to
sendmail, the folks at InfoBeat/Mercury Mail found that they got things to the point where sendmail was no longer the bottleneck -- pulling the data out of the database was the big problem, and much more difficult to solve.
These are all
systems under my control so I should be able to tune them, set up privileges, common data source access, etc. to make things work as smoothly as possible until the message hits the external outgoing interface.
You're talking about picking up a small piece of the puzzle which
is likely to have significant components which will be difficult or nearly impossible to solve.
I think there are other areas in which I would be inclined to
focus my attentions within Mailman, at least as far as efficiency is concerned. Instead of using pickles, try Berkeley db b-trees, and use that as your "queue" to be processed. The reason here is that b-trees are designed for lightning-fast cursor access, and all you need to do is make sure it's indexed on certain key fields. Let Berkeley db take care of the reliability issues (what happens if there's a crash), efficiently caching information in memory for maximum performance, etc....
I'm not even touching the 3rd rail of putting the MTA /in/ Mailman any more :).
That solution is called L-Soft Listserv, with LSMTP. In fact,
you're talking about some of the same sorts of things that they do, only you don't want to own both pieces of code that are implementing the desired standard, which will make solving the problem orders of magnitude more difficult.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
On Thu, 2004-01-22 at 19:31, Brad Knowles wrote:
At 6:56 PM -0500 2004/01/22, Barry Warsaw wrote:
I had been thinking along the lines of the language for specifying the data source as a db connection and a SQL command.
We were working on a method to send a specially MIME-formatted message to identify the various potential bodyparts, a list of recipients and who gets which bodyparts, then a language for specifying the template which pulls in the appropriate bodyparts for the appropriate recipients (and ways to insert bits of information about the user themselves into the message).
Did you publish any of this? I'd like to read it.
I think there are other areas in which I would be inclined to focus my attentions within Mailman, at least as far as efficiency is concerned. Instead of using pickles, try Berkeley db b-trees, and use that as your "queue" to be processed. The reason here is that b-trees are designed for lightning-fast cursor access, and all you need to do is make sure it's indexed on certain key fields. Let Berkeley db take care of the reliability issues (what happens if there's a crash), efficiently caching information in memory for maximum performance, etc....
Mailman 3 will definitely be database backed, via an interface that allows different back-ends to be pluggable. My prototypes use BerkeleyDB through Python 2.3's standard bsddb module.
However, my experience with transactional BerkeleyDB's performance doesn't make me confident about using it for the queue runner subsystem. We'll very likely stick with the file-based qrunner architecture, although I've worked out a way to use only one file per message.
I'm not even touching the 3rd rail of putting the MTA /in/ Mailman any more :).
That solution is called L-Soft Listserv, with LSMTP. In fact, you're talking about some of the same sorts of things that they do, only you don't want to own both pieces of code that are implementing the desired standard, which will make solving the problem orders of magnitude more difficult.
I know that rail is shiny, but I'm not touching it. :)
-Barry
At 12:45 PM -0500 2004/01/24, Barry Warsaw wrote:
Did you publish any of this? I'd like to read it.
Nope. All private conversations with Eric and Bryan.
Unfortunately, I've lost all mailboxes I had while I was at AOL, so I can't pull those messages back up.
Sorry, guy. ;-(
Mailman 3 will definitely be database backed, via an interface that allows different back-ends to be pluggable. My prototypes use BerkeleyDB through Python 2.3's standard bsddb module.
Cool.
However, my experience with transactional BerkeleyDB's performance doesn't make me confident about using it for the queue runner subsystem. We'll very likely stick with the file-based qrunner architecture, although I've worked out a way to use only one file per message.
My experience with Berkeley DB has been that you store the actual
content in files, and you put meta-data in the database (with a field that tells you the full path to the file). The file is touched only once in creation, read one or more times (on message delivery), and then deleted when all copies have been delivered. All other activity occurs within the database.
Used that way, it's blindingly fast, unbreakable, and amazingly
efficient with memory. However, I'm not convinced that using a standard Python access module is the way to get the best out of it -- I don't know how reliable that module is, and it could be a significant drain on the capabilities of Berkeley DB itself.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
Brad, I did some testing with a 50,000 members list and the time it takes to send when using personalization is twice as long (8hrs instead of 4). Of course that is bad news. I also tested some other mailing list software like Gordano's Communicator and Lyris Listmanager and they both seem to be able to send in batches but still add or include personalized fields to trace the messages. So if they can do it my simple question is why cannot mailman do it but still send in batches?
-----Original Message----- From: Brad Knowles [mailto:brad.knowles@skynet.be] Sent: Thursday, January 22, 2004 3:21 PM To: Barry Warsaw Cc: Brad Knowles; Somuchfun; mailman-developers@python.org Subject: RE: [Mailman-Developers] Adding headers to mailman generated mails
At 5:50 PM -0500 2004/01/22, Barry Warsaw wrote:
Nigel can provide details, but I think the same embedding feature could be used to have the MTA do the final stitching of content template and personalized data. It would be A Project to hack together, but I think it could be a neat idea to play with, although I'm not sure how much it would help.
This is similar to what Eric Allman (at that time, before Sendmail Inc. existed), Bryan Costales (at the time, working for InfoBeat/Mercury Mail), and I (working at AOL) were discussing back in 1996, in the creation of a Mail-Merge Transport Protocol (MMTP) server, based on a modified version of sendmail along with a standard language for transmitting that content. With MMTP servers on both ends, it would not matter how many thousands or millions of recipients you might have, only one copy of the message body would be transmitted, and all the rest would be filled in on the remote end.
We ultimately gave up on this idea because we realized that it would make the spam problem much, much worse. The same things that help regular MTAs transmit millions of customized messages per hour to their paying customers would probably allow spammers to transmit billions of messages per hour to everyone in the universe.
Certainly, before any serious discussion of creating something like an MMTP server, and trying to make that a standard which you would expect programs like sendmail, postfix, and Exim to implement, I believe that the spam issue needs to be addressed. You need to be able to prove how this cannot be abused to generate spam instead.
If the MTA could do what Mailman does here -- not creating a disk image for each instance of the message, but stitching it together in member as it's going out on the wire -- I think you'd greatly improve disk contention.
I'm not sure that the MTA could safely do that in memory. At least, it would be difficult to ensure that the MTA gets this done right. This would be akin to handling the entire message queue in memory for all messages, something which can't really be done safely except under very strict circumstances.
The only MTA I know of that is capable of doing things like this is the latest release of version 8 sendmail, and even then it defaults to handling messages in memory that are no larger than 4KB.
Yes, filesystem I/O is the number one killer, specifically synchronous meta-data updates.
But then people on this list have heard me harping on this subject for a long time, and should know by now that I will refer them to Nick Christenson's book _Sendmail Performance Tuning_ (see <http://www.jetcafe.org/~npc/book/sendmail/>), or my own slides from an invited talk entitled "Sendmail Performance Tuning for Large Systems" at <http://www.shub-internet.org/brad/papers/sendmail-tuning/>.
I don't think Postfix has the same embedding capabilities, although I haven't looked at what Postfix 2.1 may provide.
I'm not aware of anything like this, but I'd have to check.
In a sense, that's what we've talked about before. If there were a standard language that the mail server and list manager could agree on for both defining the template, and defining the per-recipient data source, we could have a more efficient mechanism, with perhaps a hope of mta agnosticism.
That would be nice. However, I fear that we have much more basic problems that are much more serious, and which need to be resolved before we can expect to start worrying about such subjects as increasing efficiency in the interfaces between MLMs and MTAs.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
At 1:04 PM -0800 2004/01/23, Somuchfun wrote:
I did some testing with a 50,000 members list and the time it takes to send when using personalization is twice as long (8hrs instead of 4).
Not surprising. See
<http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq04.012.htp>.
Of course that is bad news. I also tested some other mailing list software like Gordano's Communicator and Lyris Listmanager and they both seem to be able to send in batches but still add or include personalized fields to trace the messages. So if they can do it my simple question is why cannot mailman do it but still send in batches?
Tell me how they do it, and I might be able to tell you what the
issue is with Mailman doing the same.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
On Fri, 2004-01-23 at 16:04, Somuchfun wrote:
Brad, I did some testing with a 50,000 members list and the time it takes to send when using personalization is twice as long (8hrs instead of 4).
This is not surprising given Chuq's analysis.
Of course that is bad news. I also tested some other mailing list software like Gordano's Communicator and Lyris Listmanager and they both seem to be able to send in batches but still add or include personalized fields to trace the messages. So if they can do it my simple question is why cannot mailman do it but still send in batches?
Maybe they do XVERP? There's a patch of SF I haven't looked at yet that adds XVERP support to Mailman. But that will only help you with bounce detection, and only then with compliant servers. AFAIK, XVERP isn't yet a standard.
-Barry
"baw" == Barry Warsaw "RE: [Mailman-Developers] Adding headers to mailman generated mails" Thu, 22 Jan 2004 17:50:33 -0500
baw> I don't think Postfix has the same embedding capabilities,
baw> although I haven't looked at what Postfix 2.1 may provide.
Hmm... dunno, but perhaps XFORWARD as in postfix->filter->postfix. Found in recent snapshots and promised with improved documentation for 2.1.
jam
On Fri, 2004-01-23 at 11:50, John A. Martin wrote:
Hmm... dunno, but perhaps XFORWARD as in postfix->filter->postfix. Found in recent snapshots and promised with improved documentation for 2.1.
Interesting. That's a possible approach, although Exim's ability to embed an interpreter in the MTA is pretty powerful.
As an aside, looking through the Postfix 2.1 changelog, I noticed that Errors-To support has been removed. ;)
-Barry
On Thu, 2004-01-22 at 11:56, Somuchfun wrote:
Hello Barry, Let me try to explain why the additional header is so important. When you use large lists with lots of traffic to AOL they can set you on something that is called an "feedback loop". This loop creates automated emails from AOL's postmaster about people on one of your list (as an ISP) who have clicked the "spam" button in regards to one of the messages originating from you.
If this is really important to AOL, seeing their 'official' documentation on this would be nice, to make sure Mailman does implement it right.
Just turning on VERP may work, and would be less overhead than full personalisation if you use one of the Mailman patches that gets postfix or qmail to take on some of the VERP overhead.
And then an additional problem is that mailman does not take out x-AuthenticatedSender headers from the poster of the message. And this header added by auth smtp reveals very clearly who the sender is even when the list is set to anonymous posting!
Maybe if your list wasn't anonymous, you'd get less spam complaints? :)
As an unrelated feature though, being able to give Mailman a configurable list of headers to strip out of messages would be useful.
-- Colin Palmer <colinp@waikato.ac.nz> University of Waikato, ITS Division
participants (5)
-
Barry Warsaw
-
Brad Knowles
-
Colin Palmer
-
John A. Martin
-
Somuchfun