Limiting copies of cross posts sent
Here's one idea for limiting the number of messages sent when a message is cross-posted to multiple lists. I haven't actually tried implementing this idea yet so it may be impractical. The nice thing that I like about this method is that it doesn't require that a database of message-ids be kept.
Ok, here's the idea:
SMTP allows for multiple recipients to be specified for any message sent. A typical conversation would look something like this (apologies if my mailer wraps the lines):
220 python.org ESMTP Sendmail 8.9.1a/8.9.1 (klm); Fri, 29 Jan 1999 14:15:14 -0500 (EST) HELO max.ollie.clive.ia.us 250 python.org Hello IDENT:jcollie@max.ollie.clive.ia.us [161.210.214.102], pleased to meet you MAIL From: <jeff@ollie.clive.ia.us> 250 <jeff@ollie.clive.ia.us>... Sender ok RCPT To: <mailman-developers@python.org> 250 <mailman-developers@python.org>... Recipient ok RCPT To: <mailman-users@python.org> 250 <mailman-users@python.org>... Recipient ok DATA 354 Enter mail, end with "." on a line by itself <real message here> . 250 OAA18072 Message accepted for delivery
Now, if you can get the SMTP daemon to invoke MailMan once and pass all of the recipients, MailMan could use that information to send one copy of the message to the union of the set of subscribers for each list.
Of course MailMan would need to be changed significantly to handle this, but you wouldn't need a database.
Jeff
On Fri, Jan 29, 1999 at 01:26:46PM -0600, Jeffrey C. Ollie wrote:
Now, if you can get the SMTP daemon to invoke MailMan once and pass all of the recipients, MailMan could use that information to send one copy of the message to the union of the set of subscribers for each list.
Having just spent gods know how much time hacking up sendmail, this is a problem... here's why..
- Sendmail only does this under SMTP, not under anything else
So, that means that Mailman would have to accept its messages over SMTP, which sin't in itself a problem... the problem is that as far as I've been able to determine, sendmail can't be told to deliver SMTP mail to a different port than 25. Also, you'd end up with mailer re-write rules out the ying-yang :-) Trust me, you never ever ever ever want to do anything that requires someone to add anything bizarre to their sendmail installation.
Of course MailMan would need to be changed significantly to handle this, but you wouldn't need a database.
That's not nearly as ugly as the problems with doing it ... also it'd be different for every mailer, and quite different in many cases I think.
-- | Christopher Petrilli | petrilli@amber.org
"Christopher G. Petrilli" wrote:
On Fri, Jan 29, 1999 at 01:26:46PM -0600, Jeffrey C. Ollie wrote:
Now, if you can get the SMTP daemon to invoke MailMan once and pass all of the recipients, MailMan could use that information to send one copy of the message to the union of the set of subscribers for each list.
Having just spent gods know how much time hacking up sendmail, this is a problem... here's why..
- Sendmail only does this under SMTP, not under anything else
Not if you add the 'm' flag to the delivery agent. Then sendmail will invoke the delivery agent with multiple recipients specified on the command line.
So, that means that Mailman would have to accept its messages over SMTP, which sin't in itself a problem... the problem is that as far as I've been able to determine, sendmail can't be told to deliver SMTP mail to a different port than 25.
Actually you can, see p522 in the 2nd edition of _Sendmail_ (section 30.4.1.2).
Also, you'd end up with mailer re-write rules out the ying-yang :-) Trust me, you never ever ever ever want to do anything that requires someone to add anything bizarre to their sendmail installation.
Yeah, that's for sure. We'd want to keep it simple.
Of course MailMan would need to be changed significantly to handle this, but you wouldn't need a database.
That's not nearly as ugly as the problems with doing it ... also it'd be different for every mailer, and quite different in many cases I think.
Yes, existence of multiple SMTP daemons is a problem in that we'd have to provide documentation on how to modify the configuration for many different daemons. However doesn't this already happen? Aren't sendmail, exim, qmail, or whatever sufficiently different in their implementation that setup is already problematic?
Another, more radical idea would be to completely replace the regular SMTP daemon with a SMTP daemon written in Python that integrates directly with MailMan. I've been considering doing that for another, more radical project that I have in mind. The only problem with this is that you have to dedicate a box to running MailMan and there would be some performace issues, but I guess that this would be offset by the ability to run MailMan on a Windows or Mac OS system.
The idea of using a database to filter out multiple copies would be problematic for sites that have high volume lists with a large subscriber base. Just think of all of the storage that you'd need!
Jeff
On Fri, Jan 29, 1999 at 02:46:59PM -0600, Jeffrey C. Ollie wrote:
- Sendmail only does this under SMTP, not under anything else
Not if you add the 'm' flag to the delivery agent. Then sendmail will invoke the delivery agent with multiple recipients specified on the command line.
Interesting, I've never seen this used for mailing lists, so I don't know what to say... obviously this is a sendmail only thing that gets into the whole problem outlined below.
So, that means that Mailman would have to accept its messages over SMTP, which sin't in itself a problem... the problem is that as far as I've been able to determine, sendmail can't be told to deliver SMTP mail to a different port than 25.
Actually you can, see p522 in the 2nd edition of _Sendmail_ (section 30.4.1.2).
I stand corrected, although I will point out that this is burried quite late in the sendmail world, and 5 people I know who relaly know sendmail didn't know this :-) Hense, it's obscure at best... Regardless...
That's not nearly as ugly as the problems with doing it ... also it'd be different for every mailer, and quite different in many cases I think.
Yes, existence of multiple SMTP daemons is a problem in that we'd have to provide documentation on how to modify the configuration for many different daemons. However doesn't this already happen? Aren't sendmail, exim, qmail, or whatever sufficiently different in their implementation that setup is already problematic?
Dunno, at least with sendmail/postfix, it's just trivial additions to the /etc/aliases file. I use the same ones for postfix as you would for sendmail, absolutely, no exceptions. I THINK EXIM behaves the same way, and I think qmail does as well. This is trivially simple, it's also how every other mailer plugs into the MTA world. (Ialso run full blown LISTSERV).
Another, more radical idea would be to completely replace the regular SMTP daemon with a SMTP daemon written in Python that integrates directly with MailMan. I've been considering doing that for another, more radical project that I have in mind. The only problem with this is that you have to dedicate a box to running MailMan and there would be some performace issues, but I guess that this would be offset by the ability to run MailMan on a Windows or Mac OS system.
I'm not sure this is a plus, considering I want a REALLY reliable system :-) I don't have a problem with someone writing a Python MTA (I've thought about it many times) BUT... mailman can't use it, at least in my opinion. Period. The MTA has to deal with /etc/aliases files just like everyone else, it's a defacto-standard at this point.
The idea of using a database to filter out multiple copies would be problematic for sites that have high volume lists with a large subscriber base. Just think of all of the storage that you'd need!
Actually, locality of reference fixes this is MOST cases... that is, it's "safe" to assume for handwaving purposes that email originating from userA will reach mailserverB in x time... that is the same for all the different lists on X. So it's reasonably safe to assume that Mailman will see multiple copies almost simultaneously (i.e. no more than 5 minutes?). What this means is that you only have eto CACHE the Message-Id information, not keep it forever. And you're only keeping it for cross-posted messages also, not everything. I think this is totally doable, with a decent caching mechanism. (probably in C code, added to the normal Python distribution).
Chris
| Christopher Petrilli | petrilli@amber.org
participants (2)
-
Christopher G. Petrilli
-
Jeffrey C. Ollie