AOL's requirements for spam complaints

Ok, I just got off the phone with AOL's postmaster dept. and found out the whole scoop why they have to strip out the To: address from complaints. There was a lawsuit of breach of privacy so AOL can no longer share the To: field in their feedback loop. So here is the solution: Mailman needs to create something like an x-client-id header that has the recipient email address in it because this header will stay intact when a complaint comes back. This header needs to be created whether mailman runs in personalization mode or not. So the questions is not can mailman do it or not?

At 2:21 PM -0800 2004/01/29, Somuchfun wrote:
In personalization mode, this kind of information could
theoretically persist in the headers (with suitable source-code modifications). Otherwise, I don't think there's a mailing list manager on the planet that could make this happen.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

To enable custom headers for each message at least partially destroys the intent of a mailing list - efficient delivery of messages. If each message is customized that way then you have to actually send one distinct message to every user instead of sending a single message with multiple RCPT-TO lines. Enabling this would serve to radically increase the server overhead that Mailman causes (back to the Smartlist olden days). Instead of injecting one message to hundreds of recipients, you would be injecting hundreds of unique messages.
This is hardly a weakness of mailman and instead a weird bit of behaviour on AOLs part.
IMHO of course.
Cheers
-- |) __,,_____________ moron : <moron@industrial.org> (| |) < ___________/ EEEI news : <infosuck@industrial.org> (| |) / /-' musician community : http://ampfea.org (| |) /___/ industrial & DIY culture : http://industrial.org (| |) deterrent industries : http://deterrent.net (|

On Jan 29, 2004, at 6:03 PM, moron@industrial.org wrote:
Sorry, I don't buy this argument. If you have two choices: use more CPU time and network, or improve the end-user experience, choosing "less work for the computer" is almost always the wrong answer.
Yes, you are. And fortunately, it's not the 70's any more, and the resource limitations that caused those design decisions are gone. We aren't on 9600 baud dialups any more, for instance, or trying to run large mailing list on 286 class machines. well, I'm sure there are a few of those still out there, but that's no reason to hobble the rest of the universe with designs aimed at the last century.

On January 29, 2004 10:44 pm, you wrote:
Howdy. I do not understand why you would feel that adding a personalized
header makes the list experience any better. Would Usenet be any better
with a customized header for every news article you read? There is a big
difference in my mind between a discussion mailing list and a marketing
system with "Dear <insert name here" type pseudo personalizations (a bit like
a phone system inserting your name into the "please do not hang up" messages
when you are on hold).
I am also not convinced about the CPU argument. That's a lot of extraneous message IDs to keep track of in databases, bounce detecting, etc. Instead of being able to deliver to a 100 AOL users at once you suddenly have to send 100 separate messages. Multiply that by a busy list (some of the ones I look over are up to 150 - 200 a day sometimes) and it is still significant, especially if binaries are involved. I also wonder what effect it will have on archiving (I am not immediately sure but it could be ugly depending on whether it affects the threading complexity). Another side effect is that some servers try to block large volumes of connections from servers as an anti-spam measure (Shaw Cable here in Canada did this recently) which this would be far more likely to trigger. It's not Mailman's problem of course but something to keep in mind.
But if it works for you, hey go nuts. But the argument to me sounds dangerously similar to the one Microsoft used to push using HTML in email which we are all still feeling the unfortunate fallout from (zero cognitive benefit, plenty of headache). Just because computers are faster now does not mean that resources are suddenly free (as in beer).
Respectfully, IMHO of course.
Cheers
-- ---> (culture) http://industrial.org : (label) http://deterrent.net ---> (community) http://ampfea.org : (hire me) http://codegrunt.com ---> (send EEEI news to) infosuck@industrial.org ---> Whomever dies with the most URLs wins!!!!!!!!!!!!!

On Jan 29, 2004, at 11:04 PM, moron wrote:
Lots of research with end-users, studying their needs and researching the places that they struggle using these systems, and having designed and built a number of list servers over the years that are used by a wide range of users, not all of them geeks.
Would Usenet be any better with a customized header for every news article you read?
different argument. you don't need that user data to unsubscribe from a usenet group.You do to unsubscribe from a list server.
There is a big difference in my mind between a discussion mailing list
Not from the point of view we're talking about here, which is giving the user the info they need to operate the list properly.
With the exception of network traffic, it's actually pretty trivial stuff. No, I can't explain how I know, but I've been there, done that. The only huge cost is the network bandwidth change, which is at least 2X, and can be 5X, depending on your old configurations.
um, heh. Busy. (grin)
I also wonder what effect it will have on archiving
none.
Um, of course, the fact that users want html email is irrelevant. Lots of studies show they prefer the look of HTML to text, actually. Except in the more hard-core geek crews, but we aren't writing stuff here JUST for people who run mutt, right?
you might be surprised. Lots of benefit, no headaches.
chuq (guess what I do for a living?)

On January 29, 2004 11:19 pm, Chuq Von Rospach wrote:
Howdy. Again, how does including an extra header help the end user experience? The original complaint was due to AOL being bass ackwards and somehow feeling that an email address in an arbitrary header was more "private" than the To field (which of course it is not). In this scenario, the "customization" was simply to add the sender address back into the message which is hardly making the end experience any better (it should already be in something like "envelope-to" anyway).
How about something more concrete as to why this is such a great feature for something beyond a spam list? (I am *not* suggesting you are a spammer, just that customization would seem to be only really important for commercial mailouts which generally fall under the spam-brella).
different argument. you don't need that user data to unsubscribe from a usenet group.You do to unsubscribe from a list server.
The problem is AOL though, not Mailman. Solution? Switch to a real provider that uses RFC compliant software. And be vocal as to why you are leaving.
Not from the point of view we're talking about here, which is giving the user the info they need to operate the list properly.
The information they need is that AOL is running a broken SMTP server, no?
Ok though I have not seen evidence of this using Exim. But a 2 to 5 times increase in bandwidth use is a lot. The majority of traffic in the community server I look after at the moment is due to mail and we would definitely feel that.
Well, it depends on membership of course and the total number of lists. It's big enough for me to look after.
=)
But a 60,000+ member list on Mailman would suck due to the administration interface anyway.
Are you sure of that? I thought that Hypermail based its threading on message IDs which would be different in this case leading to far larger arrays and such to keep track of what article was connected to what. I could see this having an exponential effect on the length to regenerate archives and for building indexes. That could be a LOT of RAM usage. Any Hypermail gurus want to comment? Am I flapping in the wind on this one?
Hmm. I have yet to see a case where HTML has helped readability and folks that use it seem to solely do so because it is there not because they are trying to impart meaning. Some people like using their cell phones in theatres but that doesn't make it a postive feature. When using webmail interfaces for example no one misses it that I have ever noticed. As to the effect of HTML, even ignoring the obvious security and privacy nightmare it results in you still having horrific rendering problems depending on the exact path the message takes from sending client to viewing client. Nothing like a missing table tag to make your message unviewable.
chuq (guess what I do for a living?)
Debate with me?
=)
Cheers
-- ---> (culture) http://industrial.org : (label) http://deterrent.net ---> (community) http://ampfea.org : (hire me) http://codegrunt.com ---> (send EEEI news to) infosuck@industrial.org ---> Whomever dies with the most URLs wins!!!!!!!!!!!!!

At 11:51 PM -0800 2004/01/29, moron wrote:
Howdy. Again, how does including an extra header help the end user experience?
It doesn't. Enabling personalization does.
That is pretty bloody stupid, and is the real issue that we
should be discussing.
The problem is AOL though, not Mailman. Solution? Switch to a real provider that uses RFC compliant software. And be vocal as to why you are leaving.
Your recipients are where they are. You can't really make them
move. You can refuse to accept any recipients on AOL, but that's about it.
The information they need is that AOL is running a broken SMTP server, no?
Not a broken SMTP server, per se. It's broken policies with
regards to handling spam and mailing lists, and their stupid sanitization methods which they are asking you to work around so that the very information they sanitized is exposed elsewhere in the message.
Nevertheless, I was working at AOL when they implemented their
current SMTP server, and I can confirm that it is pretty badly broken in plenty of other ways.
Well, it depends on membership of course and the total number of lists. It's big enough for me to look after.
I guess that if you're running all the mailing lists for
apple.com, then you don't really care about increased bandwidth charges, or any of the other increased costs of running the mailing lists.
No, Chuq is right about this. The archiving is done on message
input, which is not changed as a result of personalization on message output.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Fri, 2004-01-30 at 02:51, moron wrote:
Ok though I have not seen evidence of this using Exim. But a 2 to 5 times increase in bandwidth use is a lot.
C'mon, isn't legitimate mail of any kind now just noise in the spam/virii storms? You won't even notice it. :)
-Barry

At 11:19 PM -0800 2004/01/29, Chuq Von Rospach wrote:
With the exception of network traffic, it's actually pretty trivial stuff.
Uh, no. Not just "no", but "Hell, no!"
Increased network traffic is one cost, yes. But there are plenty
of other additional costs as well, some of which are considerably more important.
And you might be surprised. Lots of benefit for the users, yes.
Lots of cost on the system, also.
chuq (guess what I do for a living?)
I know what you do for a living, and I know that you know what I
have done for a living.
The issue is not as simple as you make it out to be.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

At 10:44 PM -0800 2004/01/29, Chuq Von Rospach wrote:
You know damn good and well that this is not a CPU issue. This
is a disk I/O capacity issue (synchronous meta-data updates). Moreover, you also know full well that there are serious performance issues with enabling personalization mode on large mailing lists, such that for some lists, it would simply be impossible to do.
The increased CPU utilization and network bandwidth can be
problems for some sites, but that is not the gating factor in most cases.
This is not a valid criticism. As network bandwidth has
increased, the numbers of messages being sent and the size of the messages being sent has also increased, and the number of recipients has also increased. What has *not* increased significantly is disk I/O latency, which is the gating factor for synchronous meta-data updates.
You've got a significant increase in demand along three separate
axes, without a corresponding increase in capacity. Something has to give. We have to be more intelligent about how we deliver those messages, or the entire system grinds to a halt.
We aren't on 9600 baud dialups any more, for instance, or
trying to run large mailing list on 286 class machines.
See above. The CPU being utilized is irrelevant. What is not
irrelevant is disk I/O latency, a fact that I know that you know as well as or better than most.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

Brad Knowles wrote:
Why is it, then, that Lyris can send personalized messages to lists with hundreds of thousands of members with no problem? I don't personally have any lists that are nearly that big but I can tell you that my Lyris box sends messages to my lists with a few thousand members extremely quickly. Having personalization as a *choice* is the best thing. Then, those who worry about disk I/O or whatever can live with non-personalized delivery (at the expense of the users, of course), and those who want to move forward into the 21st century can do so with personalized delivery.
Mailing list communities want more now. Especially in Communities of Practice. Our most recent request was to tack on a person's professional profile (from another datasource) on the end of each message he or she sends. Feasible? Maybe, maybe not. But people do want this kind of thing. And I get paid to deliver what is needed. The fact is that Lyris does personalization just fine. So why continue to let Mailman lag behind?
Barry and others will be (or are) working on Mailman 3. I think that he/they should take a long hard look at the commercial MLM success story (Lyris) and take a few pages out of that book. They spent millions of dollars on R&D and made decisions base on it. Why not tap into that? Personalized delivery is just one thing. Don't get me started on SQL issues and the need for vastly improved logging for forensic purposes.
- Kevin

On Jan 30, 2004, at 5:52 AM, Kevin McCann wrote:
Lyris has made the choice it's worth it. So has mailman with personalization.
Brad is right that I trivialized some resource issues last night -- but that doesn't change my belief that for the user, it's worth using those resources to improve things for them. You don't want to waste resources; you also don't want to not use them when it's the right thing to do.
Better yet, look at the users. They aren't geeks any more. They're your mom and dad, off on a cable modem somewhere. A cable modem who has been through two or three acquisitions and domain name changes, and these folks aren't really sure what their email address is (their smart son configured their computer for them), much less what it was three changes ago when they signed up for the list.
and now they want to turn it off and go on vacation for a month, and the plane leaves in six hours.
If they can't -- it's your fault, as admin. And they're right.
that's why personalization matters.

--On Friday, January 30, 2004 08:52:05 -0500 Kevin McCann <kmccann@bellanet.org> wrote:
Why is it, then, that Lyris can send personalized messages to lists with hundreds of thousands of members with no problem? I don't personally
If you'd read your own thread, you'd know the answer already. Lyris is its own MTA - it speaks SMTP directly to the recipients' mail servers. This allows it to do on-the-fly customization at SMTP transmit time instead of having to queue each unique message.
This is _very_ _hard_ to get right, just to do SMTP properly. Personalization makes it even more difficult. Simple example: someones mail server is down. Do you:
- queue the personalized message
- queue the message template, and the list of undelivered recipients
- queue the message template, the list of undelivered recipients, and the substitution db version
Each choice has significant implications. None is obviously correct.
It would be great if MTAs included this functionality, but there are major political players who are terrified this will just be used by spammers. Personally, I think spammers could do it trivially already, as they don't care about queueing mail properly and handling all the edge cases for SMTP. But I'm not the maintainer of postfix/exim/sendmail/etc., so it's not my decision.
I'll make you a deal - you write the MTA, and I'll add support in mailman to offload the personalization.
-- Carson

Carson Gaspar wrote:
Fair dues.
I'll make you a deal - you write the MTA, and I'll add support in mailman to offload the personalization.
I do not personally have the skills to do this but I wouldn't rule out trying to get the funding to help make it happen. I wonder if there is there enough collective know-how among Mailman developers and other interested parties. Let me ask: if you don't see this as being a priority now, do you see it as being such in 2 years, 3 years, five?
More than anything I'd like to see an open source MLM that can keep up. One that can meet the ever growing list of challenges as well as expectations. So, looking down the road, where do others see things going? Should the OS MLM status quo remain, or ought there be an effort to plan for the future? Now that Mailman 3 is on the table, is a built-in MTA an issue for discussion, or is it completely unrealistic? Will it always be unrealistic?
- Kevin

At 1:23 PM -0500 2004/01/30, Kevin McCann wrote:
Do you have any concept as to how much work has gone into
sendmail over the past twenty-plus years? Or postfix, or Exim? They tend to get most things right, but even now they have plenty of problems -- they just have fewer problems than most other MTAs.
You might be talking about a change that would require
essentially throwing out everything that has been done before, and starting over from scratch.
Do you have the millions of dollars and human-centuries worth of
productive coding that it would take to write yet another MTA properly?
More than anything I'd like to see an open source MLM that can keep up.
There are some things that Barry has already ruled out. Writing
a custom MTA for Mailman is one of those things.
Don't even bother barking up this tree.
Perhaps, for Mailman 3, Barry could talk to people like Eric
Allman, Wietse Venema, and other solid MTA authors, to see if there is a way we could get a certain amount of message customization pulled into the MTA, without killing the performance of the machine.
But that's a question that Barry would have to answer.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

Brad Knowles wrote:
I'm simply thinking about MLM challenges, which are increasing every day, it seems, and thinking about what can be done. Personalization without critical slowdown is an issue. I am thinking out loud about what can be done. I don't have the answers, which is why I invite discussion. Pleasant discussion, if you can find it within you. I have not been on mm-dev forever, so I am not aware of every thread that has been discussed. Further, am I willing to contribute? Yes I am. Millions, no. But something, yes. Give a guy a break.
- Kevin

On Jan 30, 2004, at 11:42 AM, Kevin McCann wrote:
I'm simply thinking about MLM challenges, which are increasing every day, it seems,
I disagree. the MLM stuff is doing quite well. There are challenges at the e-mail level, but non-MLM-email suffers as badly as MLM-email. And I really think the spammers have moved into their own version of the "battle of the bulge", an increasingly difficult fight with ever reducing gains. It's ugly right now, but it seems to me that's at least in part due to an increasing sense of urgency by the spammers.
(more here if you care: http://www.plaidworks.com/chuqui/blog/001252.html)
and thinking about what can be done. Personalization without critical slowdown is an issue.
define "critical". personalization is inherently more resource intensive than not personalizing. Physics wins. it's more difficult to send 20 emails to 20 people than one email to 20 people; and the reality is, you can't personalize without sending those 20 emails.
Can things be improved to minimize that resource cost? Definitely. Is it a high priority? For the vast majority of mailman users, no.

On Fri, 2004-01-30 at 14:20, Brad Knowles wrote:
I'd support such an effort. I think the right way to go about this would be to design a protocol (or perhaps an API) for MLM/MTA communication. I'd be less enthusiastic about a solution that was unique to a particular MTA.
Hey Brad, maybe you can dust off those protocol specs you once did. Or maybe you can make some first passes at a new specification. That would be a good jumping off point for approaching the MTA communities.
-Barry

At 3:16 PM -0500 2004/01/30, Barry Warsaw wrote:
Agreed. I think that would be a poor choice.
Sorry, we never got to the specs stage. We got to talking about
things, then talking about the spam issue, and then the whole idea basically died right there.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Fri, 2004-01-30 at 13:23, Kevin McCann wrote:
It's possible. But then again who knows what the email landscape will look like in 5 years? I'm betting it'll look a /lot/ different than it does today, unless it doesn't.
More than anything I'd like to see an open source MLM that can keep up.
When you say "keep up", remember that because of the design decisions I outlined in my previous message, you're really talking about the MLM+MTA combination.
For all practical purposes, it's unrealistic for Mailman 3. Our biggest challenge is going to be avoiding second system syndrome, and we'll have no hope if we don't limit scope.
OTOH, I think it's important to keep this in mind as we design the delivery subsystem. I think we win if MM3's architecture makes it technically possible, even if it remains insane for us to attempt it given our current level of resources.
-Barry

Definitely. There are probably a six on this list who could write an MTA -- or have. The problem is, that dozen or so folks all (and I hope I speak for those people appropriately as I speak for myself) have come to the realization that it's rarely if ever cost-effective or worth the effort.
A secondary issue is there are more and more mutterings and grumblings that it's time to get serious and replace SMTP. If you integrate an SMTP server into Mailman and we go off and replace SMTP, where are you? out on a limb with a chain saw.
While Lyris has a lot going for it, it's tightly coupled MLM/MTA is a feature that's a mixed blessing. Now, if SMTP is replaced properly and the warts any MTA have to deal with (Hellow, Lotus Notes. Hello, exchange. hello, you know who you are) can get scraped off and not replaced with new warts, intefacing at the MTA level might be more practical.
But I wouldn't recommend it, support it, or encourage it with Mailman. not now, not in a year, not in five. Not to SMTP.
Mailman has a lot of things to do to become an even better mailing list manager before we should even think about trying to re-implement what the MTA teams are already spending all of their time on.
And I think we can do within Mailman what you think you need to integrate an MTA to do, without all of that pain and suffering. Or at least enough of it to not warrant going through the swamp to get there.
And trust me, SMTP is a swamp, with lots of hungry alligators.

Here is what I do not understand from the discussion: Mailman in its current form is slow and if personalization is turned on users cannot even get into the mailman site anymore because it takes up all available resources. We are running a list with about 50,000 subscribers. As an admin I do not really care if some people think AOL does not have their act together or not - if I want to have my emails reach them then I have to play by their rules. Like I said, I have tried other softwares on the market and used their personalization feature. I even tried the same list on the same machine. Mailman needed with personalization about 8.5 hrs. to send out one message to all 50,000 people and Lyris Listmanager needed about 4.5 hours. Is disk I/O a problem? Of course it is, but it is a problem for all list managing software packages. My experience is that mailman is just very slow when it comes to db access. Just try to add 10,000 users at once and most likely you get a time out. So perhaps mailman is better for smaller discussion list than for larger email lists. Some people here have suggested that anything besides email discussion lists are spam, I find statements like this alarming. We run a newsletter where people actively want to get the newsletter and we do not consider ourselves spamming these people. In fact we try very hard to comply with all rules, regulations and expectations - more so than some ISPs. All I want is a fast and cheap engine that can help me reach my goal - to get the email to my customers quickly and to offer easy management capabilities. So far I like mailman's management capabilities. The performance has left me being disappointed.

Howdy. I would think that Mailman's job is not to provide free marketing tools but to act as a list processor. For what it offers it is the best trade off of features, performance and price going for small to medium sized lists. If you want Lyris you should pony up and pay for it IMHO.
If all you want is a customized one way mailout then it doesn't sound like you are looking for a mailing list processor as much as a mass mailer and there are other options for that kind of thing.
But that's just my opinion of course.
Cheers
-- |) __,,_____________ moron : <moron@industrial.org> (| |) < ___________/ EEEI news : <infosuck@industrial.org> (| |) / /-' musician community : http://ampfea.org (| |) /___/ industrial & DIY culture : http://industrial.org (| |) deterrent industries : http://deterrent.net (|

On Jan 30, 2004, at 11:10 AM, moron@industrial.org wrote:
Mailman is a tool. Asking it to discern intent in its use is like asking a gun to only shoot bad people. The gun does what it's told. So does Mailman.
And your view of this stuff is very simplistic, IMHO. The real world is a lot more complex.
But that's just my opinion of course.
ditto, of course.

At 10:49 AM -0800 2004/01/30, Somuchfun wrote:
Which may be because they have implemented their own custom MTA,
something that very few other MLMs in the world have done or can do. Listserv with LSMTP being the only other example I can think of off the top of my head.
Is disk I/O a problem? Of course it is, but it is a problem for all list managing software packages.
It can be less of an issue for those MLMs that have implemented
their own custom MTA.
So perhaps mailman is better for smaller discussion list than for larger email lists.
Yup. If your list is too big for Mailman, maybe you need to find
a different MLM. Perhaps some day Mailman will have had the performance increased enough that it could handle lists that large, but maybe it can't handle them today.
Keep in mind that this is not a problem for 99% of the lists out
there that are handled today with Mailman, and there are even lists with over 200,000 recipients in operation, which are running just fine with Mailman.
Maybe Mailman is not able to handle that load on the machines you have.
So far I like mailman's management capabilities. The performance has left me being disappointed.
Perhaps it is the wrong software for your application.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Fri, 2004-01-30 at 13:49, Somuchfun wrote:
I don't totally believe that.
Understand that IMO, MM2.1's biggest architectural flaw right now is the list data storage arrangement. Pickles and list locks simply do not scale. Fixing this is a priority for MM3, but of course there will be costs. Backing Mailman with a real database (be it BerkeleyDB, MySQL, or whatever) increases the administrative costs. No way around it.
That said, MM2.1 does not retain a list lock while it delivers messages to its MTA, so it should not lock out other access to the site.
Again, the issue is likely deeper than it first appears. I will bet you that Mailman's "db access" is about as fast as you can possibly get, because the list data resides completely in memory. Lookups are a simple dictionary access, which is very fast.
Where I believe you're getting clobbered is in the specific code that generates the unique recipient copies. The technique I'm using is about as good as you can do in Python 2.1, which is MM2.1's minimum requirement. I can do a lot better if we set Python 2.3 as a baseline and make other incompatible changes. That's why it's all pushed off to Mailman 3.
And being twice as slow as Lyris is actually not bad, IMO. Lyris is probably written in C or C++. For a pure Python application like Mailman to only take twice as long is not bad.
So perhaps mailman is better for smaller discussion list than for larger email lists.
As Mailman gains in popularity, people will try to make it do things it wasn't necessarily designed for, or that weren't conceivable 6 years ago when many of the basic architectural decisions were made.
Some people here have suggested that anything besides email discussion lists are spam, I find statements like this alarming.
Spam is anything the user doesn't want to get.
I have no problem with that.
So how much would you pay to improve Mailman's performance? If we could raise a quarter million dollars in development funds, I doubt you'd be disappointed for long <wink>.
-Barry

At 10:49 AM -0800 2004/01/30, Somuchfun wrote:
Of course, this doesn't address the issue of MTA performance
tuning. I've seen situations where proper tuning resulted in a factor of ten (or more) improvement in the delivery times. See <http://www.usenix.org/publications/library/proceedings/lisa97/full_papers/21... > and <http://www.usenix.org/events/lisa98/full_papers/chalup/chalup_html/chalup.ht... > for two papers discussing this issue.
See also my slides at
<http://www.shub-internet.org/brad/papers/sendmail-tuning/> and <http://www.shub-internet.org/brad/papers/dihses/>, and the book _Sendmail Performance Tuning_ by Nick Christensen (at <http://www.jetcafe.org/~npc/book/sendmail/>).
If you haven't done your job in tuning the performance of the
MTA, you really don't have much reason to complain about the performance of a mailing list manager with a lot of recipients.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Jan 30, 2004, at 10:49 AM, Somuchfun wrote:
We are running a list with about 50,000 subscribers.
that's a fair sized list, yes. What's it running on? Are you asking too much of your hardware?
The people who think you can just ignore AOL have a really unrealistic view of the real world where most of us live.
(on the other hand, if you look at the latest numbers out of the direct marketing associations, AOL shed 800,000 paying customers last QUARTER. Of those, 450K were converted to a non-revenue style "incentive" account (something similar to "N months free if you agree to stay a year", but another 390K cancelled anyway despite being given that incentive.
By my count, that's over 3% of their user base -- in a quarter. And Morgan Stanley's analysts are saying they're expecting that loss to top a million paying accounts this calendar year, so unless AOL can figure this out, we're talking serious death spiral numbers. If you lose 1 out of every 30 customers in a three month period, something's seriously ugly...)
And mailman is free and volunteer based, and Lyris, well, very much isn't. And that definitely makes a difference. there is a TAANSTAFL aspect here...
that is true of almost all MLM's. there are very few specifically optimized for large-scale operations, and 50K is fairly large (well, not for me, but for most of the world). and I admit upfront I don't run any of my large lists on Mailman. They all run on custom built systems optimized for those operations. (and we're hiring help to work on these things, I just posted pointers to more info separately)
they room with the folks who think you can tell AOL to go to hell... (grin)
I'll tell you what: if you find a better free and open source MLM than Mailman for your needs, I'll buy you a nice dinner (because you won't). At some point, "off the shelf" solutions stop scaling, no matter what they are. And at some point, either you find a company like Lyris and pay for their expertise, or find a geek like me or JC and pay us for ours. Even though both of us also volunteeer time back to Mailman as well, as does Barry and the other key developers, and we have the knowledge to take Mailman and build a tool that'd blow Lyris off the map (and we do), this ain't our paying job, and what we don't have is the time to do it. Nor, for 95% of the people who use Mailman, do we need to...

At 8:52 AM -0500 2004/01/30, Kevin McCann wrote:
Why is it, then, that Lyris can send personalized messages to lists with hundreds of thousands of members with no problem?
Maybe they have their own custom MTA that is tightly integrated
into the mailing list manager.
Sending messages to large mailing lists very quickly is not a
problem. Doing so with personalization turned on, is a problem.
Personalization is a valid choice. Probably 99% of of mailman
lists are small enough that the additional performance cost caused by turning on personalization doesn't cause too many problems.
At issue is that other 1% of the largest mailing lists where
turning on personalization would not be feasible.
The fact is that Lyris does personalization just fine.
I don't doubt that Lyris can handle personalization just fine.
For that matter, so can Mailman. At issue is what cost do you pay to turn on personalization?
So far, I have seen nothing that leads me to believe that Lyris
is capable of doing this without doing a single delivery per recipient, which is exactly the same thing that Mailman has to do in order to achieve the same goal.
So why continue
to let Mailman lag behind?
If it requires implementing a custom MTA, that's not going to
happen. Barry has already ruled that out.
If you want that kind of thing, go with Listserv and LSMTP.
Hey, give Barry a few million dollars to fix up Mailman properly,
and I'm sure that he could come up with a way to write a custom MTA (or do whatever else is necessary) to make it competitive with other MLMs out there.
Short of that, try contributing some code yourself to solve these problems.
Mailman already does personalization. If that's what you want,
then stop complaining now.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Fri, 2004-01-30 at 08:52, Kevin McCann wrote:
And I think we can make Mailman clear its queue of a message very quickly, even with full personalization turned on. How Mailman 2.1 does personalization is not as efficient as it could be, for technical reasons I won't go into right now. I believe we can make Mailman more efficient here.
We've made the decision to not assimilate the MTA into Mailman. The big advantage here is that writing MTAs is hard, takes huge amount of resources we don't have, and we can leverage the many good open source MTAs out there. The disadvantage is that we're going to pay for personalization in MTA disk i/o. Mailman 2.1 won't get clobbered here (see my previous messages on the subject, and remember that during the betas, MM2.1 /did/ queue each personal message to disastrous results), and Mailman 3 will be better.
I totally buy that personalization improves the user experience, even for discussion lists. I think it's basically a no brainer, all other things being equal. I believe we're making the right choice here because we can support a wide range of system configurations. Small sites that can't afford even moderate increase in cpu or bandwidth (they turn off personalization), or that can and doesn't worry about i/o because their traffic is light. Larger sites can afford fast disks, mta smurf farms, and other measures to mitigate the i/o requirements of the mta. Huge sites can write their own special Python delivery module to speak WPMP (Wizzy Personalized Mail Protocol) to their custom in-house blindingly fast weave-it-on-the-wire mail server.
Exactly.
Yes.
So Kevin, you coming to PyconII? I still don't have (m)any volunteers joining me in a Mailman 3 sprint. :(
-Barry

Barry Warsaw wrote:
Hi Barry,
Thanks for you cordial and helpful response. If I can get up-to-speed with Python in order to work on the MySQL side of things, or if you think I could contribute with just the MySQL know-how, I'll go. Otherwise, I'll be sending someone else. I'd ideally like to send someone from one of our partner organizations involved with the Dgroups project, but if we can't find a suitable candidate, then maybe we can find someone from this list. And as I've mentioned before, we'll fund it. So, if anyone is interested in working with Barry on Mailman/SQL in March, let me know.
- Kevin

On Fri, 2004-01-30 at 15:38, Kevin McCann wrote:
At this stage, I'd be happy with just MySQL, or more generally, database expertise. I'm at the stage in my MM3 experimentation where we need to solidify the interfaces. Code comes later, but I'd be really happy if we could come away from a MM3 sprint with solid APIs to the various data storages, and a good architecture for handling transactions across potentially disparate databases, etc. I have no problem implementing a back-end for BerkeleyDB and/or ZODB. I could probably kludge my way through a MySQL back-end (although I'm not really a huge fan of the MySQLdb package).
To Kevin and anyone else who wants to participate: please don't wait until the last minute to sign up for the sprint, or at least signal your intent. Space at Pycon will probably be limited, and I will have to take vacation if I'm going to participate on Monday and Tuesday. I'm not going to do that to sit at a table by myself though. I plan on being there the Saturday and Sunday before the conference no matter what.
Pycon sprint page: http://www.python.org/cgi-bin/moinmoin/SprintPlan2004
Mailman sprint page: http://www.python.org/cgi-bin/moinmoin/Mailman3Sprint
Please add your name to the latter if you're coming.
-Barry

On Fri, 2004-01-30 at 20:06, Barry Warsaw wrote:
A standard MTA has to obey certain rules. The most basic of which is that you do not accept a message (ie +ve status to the . at the end of the DATA section) until you have either finally delivered the message or committed it to stable storage. Mailman talks to a standard local (same or nearby box) MTA.
Lyris is unlikely to have to play this the same way.
Mailman + MTA with personalisation on has to push 50K messages (in the example griped about) to the local MTA each of which causes a batch of disk I/O with a strong synchronous component. Lyris is likely to be able to cheat like hell here.
Of course if its only a list box, and you don't care too much about absolute auditability through the mail delivery system you could just switch of sync operations on that filesystem and probably get one hell of a speed up.... at the risk of interesting things happening in the case of a crash.
Nigel.
-- [ Nigel Metheringham Nigel.Metheringham@InTechnology.co.uk ] [ - Comments in this message are my own and not ITO opinion/policy - ]

At 10:02 AM +0000 2004/02/02, Nigel Metheringham wrote:
In fact, in the case of announce-only lists of a very
time-sensitive nature (e.g., sending out daily updates of the latest news over the past 24 hours that matches certain search criteria), you can do what InfoBeat/MercuryMail did -- run everything from a RAM disk. In that case, you don't care if there is a crash and millions of messages are lost, since you'll do another run tomorrow.
In fact, if you use one of the battery-backed RAM disks
(solid-state disks, actually) which are supported by Linux and FreeBSD (among others), you can get up to 4GB (or more) of reliable storage that will be lightning fast, and you will have the best of all possible worlds.
This enhancement is mentioned as the final step to maximum
performance gain in my slides at <http://www.shub-internet.org/brad/papers/sendmail-tuning/>. If you're going to seriously consider this route, you probably want to look at the other options, too.
The RocketDrive (see
<http://www.cenatek.com/product_rocketdrive.cfm>) is one example, then there's the SolidDate SSD (see <http://www.soliddata.com/products/1000/1000_specs.html>) and the RAM-SAN from Texas Memory Systems (see <http://www.superssd.com/default.asp>).
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

At 12:27 PM +0100 2004/02/02, Brad Knowles wrote:
I've been looking at the requirements and potential performance
you can get with Lyris ListManager, MailEngine, etc.... See <http://www.lyris.com/lm_help/7.8/Memory_And_Bandwidth_Recom.html> and <http://www.lyris.com/products/mailengine/requirements.html> for the respective requirements, and <http://www.lyris.com/products/listmanager/extreme.html> for an idea of what kind of performance they can offer.
Then look at their prices at
<http://www.lyris.com/products/mailengine/prices.html>. For 500,000 messages per hour with comprehensive support, that's a software-only cost of more than $24,000 (one million messages an hour would cost over $48,000). Using SSDs and the right configuration, I can do a higher level of performance for less money, hardware and software included. Indeed, mailman would be a key part of that system.
If you want to pay commercial prices, you can get higher levels
of performance and capabilities. But if you're not willing to pay those kinds of prices, you have to make some compromises.
You may rarely get what you pay for, but you almost always pay
for what you get -- sometimes much more than you should.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

Brad Knowles wrote:
Another article, which might be OT to the thread but nevertheless interesting:
http://john.redmood.com/osfastest.html
It is co-authored by a Lyris dude and looks at OS choices and performance-related sub-topics.
- Kevin

At 8:45 AM -0500 2004/02/03, Kevin McCann wrote:
I was wondering when someone would bring that up.
I saw that article. I tore them several new openings, and did
the same for Amber Ankerholtz (publisher of _SysAdmin_) for allowing such garbage to be published in her magazine.
Basically, these guys don't know crap, and they were using the
article as an advertisement for their stuff. They should have stuck to the OSes they know and not bothered with trying to include things that they know nothing about.
I've tried to write decent quality articles for _SysAdmin_ in the
past, but Amber's editorial team really let me down, and I have now sworn off them.
I am working on a book on a different subject, and I've got a
booklet idea in the wings on a more closely related subject.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Feb 2, 2004, at 3:27 AM, Brad Knowles wrote:
It's definitely useful and a big win. It both clears up general disk I/O, but more importantly (from what I have seen), moves certain key inodes in the delivery file structure off of disk, and since I/O operations have to lock and unlock them for update, the time wasted single-threading through them goes way down (this is why, for instance, you should generate a fairly large number of sub-queues in sendmail; if you're trying to do volume and haven't, you're being silly; it spreads the load across more than one inode)

At 8:35 AM -0800 2004/02/02, Chuq Von Rospach wrote:
Those I/O locking operations are called synchronous meta-data updates.
Imagine trying to clear an entire stadium filled with millions of
people. There is one door. It can open and close very fast, but to be sure that everything is happening safely, it has to be closed and locked between every person. So, you unlock the door, open it, step through, close the door behind you, lock it, and then the next person can come through.
No matter how fast you can open and close and luck and unlock
that door, that's going to seriously decrease the throughput you can get through the system.
Right, postfix does this by default, and sendmail can easily be
configured to do it as well. There are lots of other nice things that postfix does by default and which more modern implementations of sendmail (e.g., 8.12 and above) also do, either by default, or are easily configured to do. In particular, envelope splitting for large numbers of recipients.
This is like having multiple doors that can be operated
simultaneously. Even if there is a central operating mechanism that can handle a single unlock/open/close/lock operation, you can interleave the operations between multiple doors, and greatly increase the overall throughput.
Using improved filesystem architectures, such as softupdates,
would be like using revolving doors instead.
Using a RAM disk instead of physical drives is like making those
doors operate at light speed, instead of the limitations imposed on normal wood, glass, and steel by the laws of physics.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Mon, 2004-02-02 at 05:02, Nigel Metheringham wrote:
Yep, and Mailman will be speaking SMTP for the foreseeable future. We'll see what replaces SMTP when it's declared illegal <wink>.
That wasn't exactly what I was talking about though. Specifically, in the inner loop where Mailman is weaving the message template with the user specific data, Mailman does more work than it needs to do. Given Python 2.3 as a baseline and a few small incompatible changes, we can be about as efficient as is possible in Python. I've already got working code in my Mailman3 experiment.
-Barry

On Jan 30, 2004, at 4:59 AM, Brad Knowles wrote:
fair cop. you're right, Brad. I was tired, didn't think it through. But I still think the user experience issues trump the Network/disk issues. we're here to make life easier for people, not computers.
yup. and trust me, some of us are working on that part, too...

At 9:04 AM -0800 2004/01/30, Chuq Von Rospach wrote:
I agree. We are here to make life easier for people, and we
should do whatever we can towards that goal.
However, there are some things that are beyond our capabilities.
Some features require a significant amount of additional effort on the part of the server, and while that is desirable and feasible for most mailing lists, there may well be some mailing lists where that's just not possible. We have to acknowledge that issue.
yup. and trust me, some of us are working on that part, too...
I know. Big Mac is the third most powerful supercomputer cluster
in the world, and cost much, much less than the two clusters that are more powerful than it is, and probably much less than any other cluster in the Top 100.
But there are still limits, especially in the areas of certain
types of common technology, such as disk drives.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

At 10:41 PM -0800 2004/01/29, Chuq Von Rospach wrote:
I've written a couple of servers that do this.
That do what?
I think every server
should now, so all of mine do.
What do they do?
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

At 5:45 PM -0800 2004/01/29, Somuchfun wrote:
That is actually not true. I tested both Gordano's communicator and Lyris Listmanager and both are able to handle this requirement without a problem.
I would like to see the evidence of this claim.
I've been doing mail systems for over ten years, and I used to be
the Sr. Internet Mail Administrator for AOL. I can't imagine how they could be doing this sort of thing short of using their equivalent of personalization mode, which defeats the purpose of a mailing list.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Jan 29, 2004, at 2:44 PM, Brad Knowles wrote:
If you don't personalize, you can't do it. Why? Because with personalization off, you're sending one email in a batch to 1-n users. Since it's one copy to more than one user, then you can't individualize it (makes sense, when you think about it, right?)
So that header is what personalization mode is all about, really -- personalization. turn off personalization, but still personalize?
What I do is use this as a footer:
works great with AOL, as long as it's not a digest being reported. That's so rare I don't bother with it.

On Fri, 2004-01-30 at 01:40, Chuq Von Rospach wrote:
works great with AOL, as long as it's not a digest being reported. That's so rare I don't bother with it.
Most lists on python.org have converted to doing the same. I just noticed mailman-developers didn't include %(user_optionsurl)s in its footer even though it was personalizing.
<type> <type> <type>
Look down and enjoy. -Barry

On Thu, 2004-01-29 at 17:21, Somuchfun wrote:
This header needs to be created whether mailman runs in personalization mode or not.
So the questions is not can mailman do it or not?
It would not be hard to add this to Mailman, when full personalization is turned on. Two lines of code.
The difference between 'regular' personalization and full personalization is that the latter inserts a custom To header for each recipient. There's no reason why Mailman couldn't also add an X-800-Pound-Gorillas-Can-Hurt header with whatever information makes your life easier.
The question isn't whether it can be done, but whether we should add it to Mailman for everyone. I'm not interested in adding even more configuration options to control this, whether in Defaults.py or on the admin pages.
-Barry

On Fri, 2004-01-30 at 15:22, Bob Puff@NLE wrote:
Yes, that would be VERY helpful. I've had a couple instances where that would have helped me. Turn it on for all.
Turn what on? So far there aren't any specific proposals, i.e. "Add this header and make it contain that information".
-Barry

At 2:21 PM -0800 2004/01/29, Somuchfun wrote:
In personalization mode, this kind of information could
theoretically persist in the headers (with suitable source-code modifications). Otherwise, I don't think there's a mailing list manager on the planet that could make this happen.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

To enable custom headers for each message at least partially destroys the intent of a mailing list - efficient delivery of messages. If each message is customized that way then you have to actually send one distinct message to every user instead of sending a single message with multiple RCPT-TO lines. Enabling this would serve to radically increase the server overhead that Mailman causes (back to the Smartlist olden days). Instead of injecting one message to hundreds of recipients, you would be injecting hundreds of unique messages.
This is hardly a weakness of mailman and instead a weird bit of behaviour on AOLs part.
IMHO of course.
Cheers
-- |) __,,_____________ moron : <moron@industrial.org> (| |) < ___________/ EEEI news : <infosuck@industrial.org> (| |) / /-' musician community : http://ampfea.org (| |) /___/ industrial & DIY culture : http://industrial.org (| |) deterrent industries : http://deterrent.net (|

On Jan 29, 2004, at 6:03 PM, moron@industrial.org wrote:
Sorry, I don't buy this argument. If you have two choices: use more CPU time and network, or improve the end-user experience, choosing "less work for the computer" is almost always the wrong answer.
Yes, you are. And fortunately, it's not the 70's any more, and the resource limitations that caused those design decisions are gone. We aren't on 9600 baud dialups any more, for instance, or trying to run large mailing list on 286 class machines. well, I'm sure there are a few of those still out there, but that's no reason to hobble the rest of the universe with designs aimed at the last century.

On January 29, 2004 10:44 pm, you wrote:
Howdy. I do not understand why you would feel that adding a personalized
header makes the list experience any better. Would Usenet be any better
with a customized header for every news article you read? There is a big
difference in my mind between a discussion mailing list and a marketing
system with "Dear <insert name here" type pseudo personalizations (a bit like
a phone system inserting your name into the "please do not hang up" messages
when you are on hold).
I am also not convinced about the CPU argument. That's a lot of extraneous message IDs to keep track of in databases, bounce detecting, etc. Instead of being able to deliver to a 100 AOL users at once you suddenly have to send 100 separate messages. Multiply that by a busy list (some of the ones I look over are up to 150 - 200 a day sometimes) and it is still significant, especially if binaries are involved. I also wonder what effect it will have on archiving (I am not immediately sure but it could be ugly depending on whether it affects the threading complexity). Another side effect is that some servers try to block large volumes of connections from servers as an anti-spam measure (Shaw Cable here in Canada did this recently) which this would be far more likely to trigger. It's not Mailman's problem of course but something to keep in mind.
But if it works for you, hey go nuts. But the argument to me sounds dangerously similar to the one Microsoft used to push using HTML in email which we are all still feeling the unfortunate fallout from (zero cognitive benefit, plenty of headache). Just because computers are faster now does not mean that resources are suddenly free (as in beer).
Respectfully, IMHO of course.
Cheers
-- ---> (culture) http://industrial.org : (label) http://deterrent.net ---> (community) http://ampfea.org : (hire me) http://codegrunt.com ---> (send EEEI news to) infosuck@industrial.org ---> Whomever dies with the most URLs wins!!!!!!!!!!!!!

On Jan 29, 2004, at 11:04 PM, moron wrote:
Lots of research with end-users, studying their needs and researching the places that they struggle using these systems, and having designed and built a number of list servers over the years that are used by a wide range of users, not all of them geeks.
Would Usenet be any better with a customized header for every news article you read?
different argument. you don't need that user data to unsubscribe from a usenet group.You do to unsubscribe from a list server.
There is a big difference in my mind between a discussion mailing list
Not from the point of view we're talking about here, which is giving the user the info they need to operate the list properly.
With the exception of network traffic, it's actually pretty trivial stuff. No, I can't explain how I know, but I've been there, done that. The only huge cost is the network bandwidth change, which is at least 2X, and can be 5X, depending on your old configurations.
um, heh. Busy. (grin)
I also wonder what effect it will have on archiving
none.
Um, of course, the fact that users want html email is irrelevant. Lots of studies show they prefer the look of HTML to text, actually. Except in the more hard-core geek crews, but we aren't writing stuff here JUST for people who run mutt, right?
you might be surprised. Lots of benefit, no headaches.
chuq (guess what I do for a living?)

On January 29, 2004 11:19 pm, Chuq Von Rospach wrote:
Howdy. Again, how does including an extra header help the end user experience? The original complaint was due to AOL being bass ackwards and somehow feeling that an email address in an arbitrary header was more "private" than the To field (which of course it is not). In this scenario, the "customization" was simply to add the sender address back into the message which is hardly making the end experience any better (it should already be in something like "envelope-to" anyway).
How about something more concrete as to why this is such a great feature for something beyond a spam list? (I am *not* suggesting you are a spammer, just that customization would seem to be only really important for commercial mailouts which generally fall under the spam-brella).
different argument. you don't need that user data to unsubscribe from a usenet group.You do to unsubscribe from a list server.
The problem is AOL though, not Mailman. Solution? Switch to a real provider that uses RFC compliant software. And be vocal as to why you are leaving.
Not from the point of view we're talking about here, which is giving the user the info they need to operate the list properly.
The information they need is that AOL is running a broken SMTP server, no?
Ok though I have not seen evidence of this using Exim. But a 2 to 5 times increase in bandwidth use is a lot. The majority of traffic in the community server I look after at the moment is due to mail and we would definitely feel that.
Well, it depends on membership of course and the total number of lists. It's big enough for me to look after.
=)
But a 60,000+ member list on Mailman would suck due to the administration interface anyway.
Are you sure of that? I thought that Hypermail based its threading on message IDs which would be different in this case leading to far larger arrays and such to keep track of what article was connected to what. I could see this having an exponential effect on the length to regenerate archives and for building indexes. That could be a LOT of RAM usage. Any Hypermail gurus want to comment? Am I flapping in the wind on this one?
Hmm. I have yet to see a case where HTML has helped readability and folks that use it seem to solely do so because it is there not because they are trying to impart meaning. Some people like using their cell phones in theatres but that doesn't make it a postive feature. When using webmail interfaces for example no one misses it that I have ever noticed. As to the effect of HTML, even ignoring the obvious security and privacy nightmare it results in you still having horrific rendering problems depending on the exact path the message takes from sending client to viewing client. Nothing like a missing table tag to make your message unviewable.
chuq (guess what I do for a living?)
Debate with me?
=)
Cheers
-- ---> (culture) http://industrial.org : (label) http://deterrent.net ---> (community) http://ampfea.org : (hire me) http://codegrunt.com ---> (send EEEI news to) infosuck@industrial.org ---> Whomever dies with the most URLs wins!!!!!!!!!!!!!

At 11:51 PM -0800 2004/01/29, moron wrote:
Howdy. Again, how does including an extra header help the end user experience?
It doesn't. Enabling personalization does.
That is pretty bloody stupid, and is the real issue that we
should be discussing.
The problem is AOL though, not Mailman. Solution? Switch to a real provider that uses RFC compliant software. And be vocal as to why you are leaving.
Your recipients are where they are. You can't really make them
move. You can refuse to accept any recipients on AOL, but that's about it.
The information they need is that AOL is running a broken SMTP server, no?
Not a broken SMTP server, per se. It's broken policies with
regards to handling spam and mailing lists, and their stupid sanitization methods which they are asking you to work around so that the very information they sanitized is exposed elsewhere in the message.
Nevertheless, I was working at AOL when they implemented their
current SMTP server, and I can confirm that it is pretty badly broken in plenty of other ways.
Well, it depends on membership of course and the total number of lists. It's big enough for me to look after.
I guess that if you're running all the mailing lists for
apple.com, then you don't really care about increased bandwidth charges, or any of the other increased costs of running the mailing lists.
No, Chuq is right about this. The archiving is done on message
input, which is not changed as a result of personalization on message output.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Fri, 2004-01-30 at 02:51, moron wrote:
Ok though I have not seen evidence of this using Exim. But a 2 to 5 times increase in bandwidth use is a lot.
C'mon, isn't legitimate mail of any kind now just noise in the spam/virii storms? You won't even notice it. :)
-Barry

At 11:19 PM -0800 2004/01/29, Chuq Von Rospach wrote:
With the exception of network traffic, it's actually pretty trivial stuff.
Uh, no. Not just "no", but "Hell, no!"
Increased network traffic is one cost, yes. But there are plenty
of other additional costs as well, some of which are considerably more important.
And you might be surprised. Lots of benefit for the users, yes.
Lots of cost on the system, also.
chuq (guess what I do for a living?)
I know what you do for a living, and I know that you know what I
have done for a living.
The issue is not as simple as you make it out to be.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

At 10:44 PM -0800 2004/01/29, Chuq Von Rospach wrote:
You know damn good and well that this is not a CPU issue. This
is a disk I/O capacity issue (synchronous meta-data updates). Moreover, you also know full well that there are serious performance issues with enabling personalization mode on large mailing lists, such that for some lists, it would simply be impossible to do.
The increased CPU utilization and network bandwidth can be
problems for some sites, but that is not the gating factor in most cases.
This is not a valid criticism. As network bandwidth has
increased, the numbers of messages being sent and the size of the messages being sent has also increased, and the number of recipients has also increased. What has *not* increased significantly is disk I/O latency, which is the gating factor for synchronous meta-data updates.
You've got a significant increase in demand along three separate
axes, without a corresponding increase in capacity. Something has to give. We have to be more intelligent about how we deliver those messages, or the entire system grinds to a halt.
We aren't on 9600 baud dialups any more, for instance, or
trying to run large mailing list on 286 class machines.
See above. The CPU being utilized is irrelevant. What is not
irrelevant is disk I/O latency, a fact that I know that you know as well as or better than most.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

Brad Knowles wrote:
Why is it, then, that Lyris can send personalized messages to lists with hundreds of thousands of members with no problem? I don't personally have any lists that are nearly that big but I can tell you that my Lyris box sends messages to my lists with a few thousand members extremely quickly. Having personalization as a *choice* is the best thing. Then, those who worry about disk I/O or whatever can live with non-personalized delivery (at the expense of the users, of course), and those who want to move forward into the 21st century can do so with personalized delivery.
Mailing list communities want more now. Especially in Communities of Practice. Our most recent request was to tack on a person's professional profile (from another datasource) on the end of each message he or she sends. Feasible? Maybe, maybe not. But people do want this kind of thing. And I get paid to deliver what is needed. The fact is that Lyris does personalization just fine. So why continue to let Mailman lag behind?
Barry and others will be (or are) working on Mailman 3. I think that he/they should take a long hard look at the commercial MLM success story (Lyris) and take a few pages out of that book. They spent millions of dollars on R&D and made decisions base on it. Why not tap into that? Personalized delivery is just one thing. Don't get me started on SQL issues and the need for vastly improved logging for forensic purposes.
- Kevin

On Jan 30, 2004, at 5:52 AM, Kevin McCann wrote:
Lyris has made the choice it's worth it. So has mailman with personalization.
Brad is right that I trivialized some resource issues last night -- but that doesn't change my belief that for the user, it's worth using those resources to improve things for them. You don't want to waste resources; you also don't want to not use them when it's the right thing to do.
Better yet, look at the users. They aren't geeks any more. They're your mom and dad, off on a cable modem somewhere. A cable modem who has been through two or three acquisitions and domain name changes, and these folks aren't really sure what their email address is (their smart son configured their computer for them), much less what it was three changes ago when they signed up for the list.
and now they want to turn it off and go on vacation for a month, and the plane leaves in six hours.
If they can't -- it's your fault, as admin. And they're right.
that's why personalization matters.

--On Friday, January 30, 2004 08:52:05 -0500 Kevin McCann <kmccann@bellanet.org> wrote:
Why is it, then, that Lyris can send personalized messages to lists with hundreds of thousands of members with no problem? I don't personally
If you'd read your own thread, you'd know the answer already. Lyris is its own MTA - it speaks SMTP directly to the recipients' mail servers. This allows it to do on-the-fly customization at SMTP transmit time instead of having to queue each unique message.
This is _very_ _hard_ to get right, just to do SMTP properly. Personalization makes it even more difficult. Simple example: someones mail server is down. Do you:
- queue the personalized message
- queue the message template, and the list of undelivered recipients
- queue the message template, the list of undelivered recipients, and the substitution db version
Each choice has significant implications. None is obviously correct.
It would be great if MTAs included this functionality, but there are major political players who are terrified this will just be used by spammers. Personally, I think spammers could do it trivially already, as they don't care about queueing mail properly and handling all the edge cases for SMTP. But I'm not the maintainer of postfix/exim/sendmail/etc., so it's not my decision.
I'll make you a deal - you write the MTA, and I'll add support in mailman to offload the personalization.
-- Carson

Carson Gaspar wrote:
Fair dues.
I'll make you a deal - you write the MTA, and I'll add support in mailman to offload the personalization.
I do not personally have the skills to do this but I wouldn't rule out trying to get the funding to help make it happen. I wonder if there is there enough collective know-how among Mailman developers and other interested parties. Let me ask: if you don't see this as being a priority now, do you see it as being such in 2 years, 3 years, five?
More than anything I'd like to see an open source MLM that can keep up. One that can meet the ever growing list of challenges as well as expectations. So, looking down the road, where do others see things going? Should the OS MLM status quo remain, or ought there be an effort to plan for the future? Now that Mailman 3 is on the table, is a built-in MTA an issue for discussion, or is it completely unrealistic? Will it always be unrealistic?
- Kevin

At 1:23 PM -0500 2004/01/30, Kevin McCann wrote:
Do you have any concept as to how much work has gone into
sendmail over the past twenty-plus years? Or postfix, or Exim? They tend to get most things right, but even now they have plenty of problems -- they just have fewer problems than most other MTAs.
You might be talking about a change that would require
essentially throwing out everything that has been done before, and starting over from scratch.
Do you have the millions of dollars and human-centuries worth of
productive coding that it would take to write yet another MTA properly?
More than anything I'd like to see an open source MLM that can keep up.
There are some things that Barry has already ruled out. Writing
a custom MTA for Mailman is one of those things.
Don't even bother barking up this tree.
Perhaps, for Mailman 3, Barry could talk to people like Eric
Allman, Wietse Venema, and other solid MTA authors, to see if there is a way we could get a certain amount of message customization pulled into the MTA, without killing the performance of the machine.
But that's a question that Barry would have to answer.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

Brad Knowles wrote:
I'm simply thinking about MLM challenges, which are increasing every day, it seems, and thinking about what can be done. Personalization without critical slowdown is an issue. I am thinking out loud about what can be done. I don't have the answers, which is why I invite discussion. Pleasant discussion, if you can find it within you. I have not been on mm-dev forever, so I am not aware of every thread that has been discussed. Further, am I willing to contribute? Yes I am. Millions, no. But something, yes. Give a guy a break.
- Kevin

On Jan 30, 2004, at 11:42 AM, Kevin McCann wrote:
I'm simply thinking about MLM challenges, which are increasing every day, it seems,
I disagree. the MLM stuff is doing quite well. There are challenges at the e-mail level, but non-MLM-email suffers as badly as MLM-email. And I really think the spammers have moved into their own version of the "battle of the bulge", an increasingly difficult fight with ever reducing gains. It's ugly right now, but it seems to me that's at least in part due to an increasing sense of urgency by the spammers.
(more here if you care: http://www.plaidworks.com/chuqui/blog/001252.html)
and thinking about what can be done. Personalization without critical slowdown is an issue.
define "critical". personalization is inherently more resource intensive than not personalizing. Physics wins. it's more difficult to send 20 emails to 20 people than one email to 20 people; and the reality is, you can't personalize without sending those 20 emails.
Can things be improved to minimize that resource cost? Definitely. Is it a high priority? For the vast majority of mailman users, no.

On Fri, 2004-01-30 at 14:20, Brad Knowles wrote:
I'd support such an effort. I think the right way to go about this would be to design a protocol (or perhaps an API) for MLM/MTA communication. I'd be less enthusiastic about a solution that was unique to a particular MTA.
Hey Brad, maybe you can dust off those protocol specs you once did. Or maybe you can make some first passes at a new specification. That would be a good jumping off point for approaching the MTA communities.
-Barry

At 3:16 PM -0500 2004/01/30, Barry Warsaw wrote:
Agreed. I think that would be a poor choice.
Sorry, we never got to the specs stage. We got to talking about
things, then talking about the spam issue, and then the whole idea basically died right there.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Fri, 2004-01-30 at 13:23, Kevin McCann wrote:
It's possible. But then again who knows what the email landscape will look like in 5 years? I'm betting it'll look a /lot/ different than it does today, unless it doesn't.
More than anything I'd like to see an open source MLM that can keep up.
When you say "keep up", remember that because of the design decisions I outlined in my previous message, you're really talking about the MLM+MTA combination.
For all practical purposes, it's unrealistic for Mailman 3. Our biggest challenge is going to be avoiding second system syndrome, and we'll have no hope if we don't limit scope.
OTOH, I think it's important to keep this in mind as we design the delivery subsystem. I think we win if MM3's architecture makes it technically possible, even if it remains insane for us to attempt it given our current level of resources.
-Barry

Definitely. There are probably a six on this list who could write an MTA -- or have. The problem is, that dozen or so folks all (and I hope I speak for those people appropriately as I speak for myself) have come to the realization that it's rarely if ever cost-effective or worth the effort.
A secondary issue is there are more and more mutterings and grumblings that it's time to get serious and replace SMTP. If you integrate an SMTP server into Mailman and we go off and replace SMTP, where are you? out on a limb with a chain saw.
While Lyris has a lot going for it, it's tightly coupled MLM/MTA is a feature that's a mixed blessing. Now, if SMTP is replaced properly and the warts any MTA have to deal with (Hellow, Lotus Notes. Hello, exchange. hello, you know who you are) can get scraped off and not replaced with new warts, intefacing at the MTA level might be more practical.
But I wouldn't recommend it, support it, or encourage it with Mailman. not now, not in a year, not in five. Not to SMTP.
Mailman has a lot of things to do to become an even better mailing list manager before we should even think about trying to re-implement what the MTA teams are already spending all of their time on.
And I think we can do within Mailman what you think you need to integrate an MTA to do, without all of that pain and suffering. Or at least enough of it to not warrant going through the swamp to get there.
And trust me, SMTP is a swamp, with lots of hungry alligators.

Here is what I do not understand from the discussion: Mailman in its current form is slow and if personalization is turned on users cannot even get into the mailman site anymore because it takes up all available resources. We are running a list with about 50,000 subscribers. As an admin I do not really care if some people think AOL does not have their act together or not - if I want to have my emails reach them then I have to play by their rules. Like I said, I have tried other softwares on the market and used their personalization feature. I even tried the same list on the same machine. Mailman needed with personalization about 8.5 hrs. to send out one message to all 50,000 people and Lyris Listmanager needed about 4.5 hours. Is disk I/O a problem? Of course it is, but it is a problem for all list managing software packages. My experience is that mailman is just very slow when it comes to db access. Just try to add 10,000 users at once and most likely you get a time out. So perhaps mailman is better for smaller discussion list than for larger email lists. Some people here have suggested that anything besides email discussion lists are spam, I find statements like this alarming. We run a newsletter where people actively want to get the newsletter and we do not consider ourselves spamming these people. In fact we try very hard to comply with all rules, regulations and expectations - more so than some ISPs. All I want is a fast and cheap engine that can help me reach my goal - to get the email to my customers quickly and to offer easy management capabilities. So far I like mailman's management capabilities. The performance has left me being disappointed.

Howdy. I would think that Mailman's job is not to provide free marketing tools but to act as a list processor. For what it offers it is the best trade off of features, performance and price going for small to medium sized lists. If you want Lyris you should pony up and pay for it IMHO.
If all you want is a customized one way mailout then it doesn't sound like you are looking for a mailing list processor as much as a mass mailer and there are other options for that kind of thing.
But that's just my opinion of course.
Cheers
-- |) __,,_____________ moron : <moron@industrial.org> (| |) < ___________/ EEEI news : <infosuck@industrial.org> (| |) / /-' musician community : http://ampfea.org (| |) /___/ industrial & DIY culture : http://industrial.org (| |) deterrent industries : http://deterrent.net (|

On Jan 30, 2004, at 11:10 AM, moron@industrial.org wrote:
Mailman is a tool. Asking it to discern intent in its use is like asking a gun to only shoot bad people. The gun does what it's told. So does Mailman.
And your view of this stuff is very simplistic, IMHO. The real world is a lot more complex.
But that's just my opinion of course.
ditto, of course.

At 10:49 AM -0800 2004/01/30, Somuchfun wrote:
Which may be because they have implemented their own custom MTA,
something that very few other MLMs in the world have done or can do. Listserv with LSMTP being the only other example I can think of off the top of my head.
Is disk I/O a problem? Of course it is, but it is a problem for all list managing software packages.
It can be less of an issue for those MLMs that have implemented
their own custom MTA.
So perhaps mailman is better for smaller discussion list than for larger email lists.
Yup. If your list is too big for Mailman, maybe you need to find
a different MLM. Perhaps some day Mailman will have had the performance increased enough that it could handle lists that large, but maybe it can't handle them today.
Keep in mind that this is not a problem for 99% of the lists out
there that are handled today with Mailman, and there are even lists with over 200,000 recipients in operation, which are running just fine with Mailman.
Maybe Mailman is not able to handle that load on the machines you have.
So far I like mailman's management capabilities. The performance has left me being disappointed.
Perhaps it is the wrong software for your application.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Fri, 2004-01-30 at 13:49, Somuchfun wrote:
I don't totally believe that.
Understand that IMO, MM2.1's biggest architectural flaw right now is the list data storage arrangement. Pickles and list locks simply do not scale. Fixing this is a priority for MM3, but of course there will be costs. Backing Mailman with a real database (be it BerkeleyDB, MySQL, or whatever) increases the administrative costs. No way around it.
That said, MM2.1 does not retain a list lock while it delivers messages to its MTA, so it should not lock out other access to the site.
Again, the issue is likely deeper than it first appears. I will bet you that Mailman's "db access" is about as fast as you can possibly get, because the list data resides completely in memory. Lookups are a simple dictionary access, which is very fast.
Where I believe you're getting clobbered is in the specific code that generates the unique recipient copies. The technique I'm using is about as good as you can do in Python 2.1, which is MM2.1's minimum requirement. I can do a lot better if we set Python 2.3 as a baseline and make other incompatible changes. That's why it's all pushed off to Mailman 3.
And being twice as slow as Lyris is actually not bad, IMO. Lyris is probably written in C or C++. For a pure Python application like Mailman to only take twice as long is not bad.
So perhaps mailman is better for smaller discussion list than for larger email lists.
As Mailman gains in popularity, people will try to make it do things it wasn't necessarily designed for, or that weren't conceivable 6 years ago when many of the basic architectural decisions were made.
Some people here have suggested that anything besides email discussion lists are spam, I find statements like this alarming.
Spam is anything the user doesn't want to get.
I have no problem with that.
So how much would you pay to improve Mailman's performance? If we could raise a quarter million dollars in development funds, I doubt you'd be disappointed for long <wink>.
-Barry

At 10:49 AM -0800 2004/01/30, Somuchfun wrote:
Of course, this doesn't address the issue of MTA performance
tuning. I've seen situations where proper tuning resulted in a factor of ten (or more) improvement in the delivery times. See <http://www.usenix.org/publications/library/proceedings/lisa97/full_papers/21... > and <http://www.usenix.org/events/lisa98/full_papers/chalup/chalup_html/chalup.ht... > for two papers discussing this issue.
See also my slides at
<http://www.shub-internet.org/brad/papers/sendmail-tuning/> and <http://www.shub-internet.org/brad/papers/dihses/>, and the book _Sendmail Performance Tuning_ by Nick Christensen (at <http://www.jetcafe.org/~npc/book/sendmail/>).
If you haven't done your job in tuning the performance of the
MTA, you really don't have much reason to complain about the performance of a mailing list manager with a lot of recipients.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Jan 30, 2004, at 10:49 AM, Somuchfun wrote:
We are running a list with about 50,000 subscribers.
that's a fair sized list, yes. What's it running on? Are you asking too much of your hardware?
The people who think you can just ignore AOL have a really unrealistic view of the real world where most of us live.
(on the other hand, if you look at the latest numbers out of the direct marketing associations, AOL shed 800,000 paying customers last QUARTER. Of those, 450K were converted to a non-revenue style "incentive" account (something similar to "N months free if you agree to stay a year", but another 390K cancelled anyway despite being given that incentive.
By my count, that's over 3% of their user base -- in a quarter. And Morgan Stanley's analysts are saying they're expecting that loss to top a million paying accounts this calendar year, so unless AOL can figure this out, we're talking serious death spiral numbers. If you lose 1 out of every 30 customers in a three month period, something's seriously ugly...)
And mailman is free and volunteer based, and Lyris, well, very much isn't. And that definitely makes a difference. there is a TAANSTAFL aspect here...
that is true of almost all MLM's. there are very few specifically optimized for large-scale operations, and 50K is fairly large (well, not for me, but for most of the world). and I admit upfront I don't run any of my large lists on Mailman. They all run on custom built systems optimized for those operations. (and we're hiring help to work on these things, I just posted pointers to more info separately)
they room with the folks who think you can tell AOL to go to hell... (grin)
I'll tell you what: if you find a better free and open source MLM than Mailman for your needs, I'll buy you a nice dinner (because you won't). At some point, "off the shelf" solutions stop scaling, no matter what they are. And at some point, either you find a company like Lyris and pay for their expertise, or find a geek like me or JC and pay us for ours. Even though both of us also volunteeer time back to Mailman as well, as does Barry and the other key developers, and we have the knowledge to take Mailman and build a tool that'd blow Lyris off the map (and we do), this ain't our paying job, and what we don't have is the time to do it. Nor, for 95% of the people who use Mailman, do we need to...

At 8:52 AM -0500 2004/01/30, Kevin McCann wrote:
Why is it, then, that Lyris can send personalized messages to lists with hundreds of thousands of members with no problem?
Maybe they have their own custom MTA that is tightly integrated
into the mailing list manager.
Sending messages to large mailing lists very quickly is not a
problem. Doing so with personalization turned on, is a problem.
Personalization is a valid choice. Probably 99% of of mailman
lists are small enough that the additional performance cost caused by turning on personalization doesn't cause too many problems.
At issue is that other 1% of the largest mailing lists where
turning on personalization would not be feasible.
The fact is that Lyris does personalization just fine.
I don't doubt that Lyris can handle personalization just fine.
For that matter, so can Mailman. At issue is what cost do you pay to turn on personalization?
So far, I have seen nothing that leads me to believe that Lyris
is capable of doing this without doing a single delivery per recipient, which is exactly the same thing that Mailman has to do in order to achieve the same goal.
So why continue
to let Mailman lag behind?
If it requires implementing a custom MTA, that's not going to
happen. Barry has already ruled that out.
If you want that kind of thing, go with Listserv and LSMTP.
Hey, give Barry a few million dollars to fix up Mailman properly,
and I'm sure that he could come up with a way to write a custom MTA (or do whatever else is necessary) to make it competitive with other MLMs out there.
Short of that, try contributing some code yourself to solve these problems.
Mailman already does personalization. If that's what you want,
then stop complaining now.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Fri, 2004-01-30 at 08:52, Kevin McCann wrote:
And I think we can make Mailman clear its queue of a message very quickly, even with full personalization turned on. How Mailman 2.1 does personalization is not as efficient as it could be, for technical reasons I won't go into right now. I believe we can make Mailman more efficient here.
We've made the decision to not assimilate the MTA into Mailman. The big advantage here is that writing MTAs is hard, takes huge amount of resources we don't have, and we can leverage the many good open source MTAs out there. The disadvantage is that we're going to pay for personalization in MTA disk i/o. Mailman 2.1 won't get clobbered here (see my previous messages on the subject, and remember that during the betas, MM2.1 /did/ queue each personal message to disastrous results), and Mailman 3 will be better.
I totally buy that personalization improves the user experience, even for discussion lists. I think it's basically a no brainer, all other things being equal. I believe we're making the right choice here because we can support a wide range of system configurations. Small sites that can't afford even moderate increase in cpu or bandwidth (they turn off personalization), or that can and doesn't worry about i/o because their traffic is light. Larger sites can afford fast disks, mta smurf farms, and other measures to mitigate the i/o requirements of the mta. Huge sites can write their own special Python delivery module to speak WPMP (Wizzy Personalized Mail Protocol) to their custom in-house blindingly fast weave-it-on-the-wire mail server.
Exactly.
Yes.
So Kevin, you coming to PyconII? I still don't have (m)any volunteers joining me in a Mailman 3 sprint. :(
-Barry

Barry Warsaw wrote:
Hi Barry,
Thanks for you cordial and helpful response. If I can get up-to-speed with Python in order to work on the MySQL side of things, or if you think I could contribute with just the MySQL know-how, I'll go. Otherwise, I'll be sending someone else. I'd ideally like to send someone from one of our partner organizations involved with the Dgroups project, but if we can't find a suitable candidate, then maybe we can find someone from this list. And as I've mentioned before, we'll fund it. So, if anyone is interested in working with Barry on Mailman/SQL in March, let me know.
- Kevin

On Fri, 2004-01-30 at 15:38, Kevin McCann wrote:
At this stage, I'd be happy with just MySQL, or more generally, database expertise. I'm at the stage in my MM3 experimentation where we need to solidify the interfaces. Code comes later, but I'd be really happy if we could come away from a MM3 sprint with solid APIs to the various data storages, and a good architecture for handling transactions across potentially disparate databases, etc. I have no problem implementing a back-end for BerkeleyDB and/or ZODB. I could probably kludge my way through a MySQL back-end (although I'm not really a huge fan of the MySQLdb package).
To Kevin and anyone else who wants to participate: please don't wait until the last minute to sign up for the sprint, or at least signal your intent. Space at Pycon will probably be limited, and I will have to take vacation if I'm going to participate on Monday and Tuesday. I'm not going to do that to sit at a table by myself though. I plan on being there the Saturday and Sunday before the conference no matter what.
Pycon sprint page: http://www.python.org/cgi-bin/moinmoin/SprintPlan2004
Mailman sprint page: http://www.python.org/cgi-bin/moinmoin/Mailman3Sprint
Please add your name to the latter if you're coming.
-Barry

On Fri, 2004-01-30 at 20:06, Barry Warsaw wrote:
A standard MTA has to obey certain rules. The most basic of which is that you do not accept a message (ie +ve status to the . at the end of the DATA section) until you have either finally delivered the message or committed it to stable storage. Mailman talks to a standard local (same or nearby box) MTA.
Lyris is unlikely to have to play this the same way.
Mailman + MTA with personalisation on has to push 50K messages (in the example griped about) to the local MTA each of which causes a batch of disk I/O with a strong synchronous component. Lyris is likely to be able to cheat like hell here.
Of course if its only a list box, and you don't care too much about absolute auditability through the mail delivery system you could just switch of sync operations on that filesystem and probably get one hell of a speed up.... at the risk of interesting things happening in the case of a crash.
Nigel.
-- [ Nigel Metheringham Nigel.Metheringham@InTechnology.co.uk ] [ - Comments in this message are my own and not ITO opinion/policy - ]

At 10:02 AM +0000 2004/02/02, Nigel Metheringham wrote:
In fact, in the case of announce-only lists of a very
time-sensitive nature (e.g., sending out daily updates of the latest news over the past 24 hours that matches certain search criteria), you can do what InfoBeat/MercuryMail did -- run everything from a RAM disk. In that case, you don't care if there is a crash and millions of messages are lost, since you'll do another run tomorrow.
In fact, if you use one of the battery-backed RAM disks
(solid-state disks, actually) which are supported by Linux and FreeBSD (among others), you can get up to 4GB (or more) of reliable storage that will be lightning fast, and you will have the best of all possible worlds.
This enhancement is mentioned as the final step to maximum
performance gain in my slides at <http://www.shub-internet.org/brad/papers/sendmail-tuning/>. If you're going to seriously consider this route, you probably want to look at the other options, too.
The RocketDrive (see
<http://www.cenatek.com/product_rocketdrive.cfm>) is one example, then there's the SolidDate SSD (see <http://www.soliddata.com/products/1000/1000_specs.html>) and the RAM-SAN from Texas Memory Systems (see <http://www.superssd.com/default.asp>).
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

At 12:27 PM +0100 2004/02/02, Brad Knowles wrote:
I've been looking at the requirements and potential performance
you can get with Lyris ListManager, MailEngine, etc.... See <http://www.lyris.com/lm_help/7.8/Memory_And_Bandwidth_Recom.html> and <http://www.lyris.com/products/mailengine/requirements.html> for the respective requirements, and <http://www.lyris.com/products/listmanager/extreme.html> for an idea of what kind of performance they can offer.
Then look at their prices at
<http://www.lyris.com/products/mailengine/prices.html>. For 500,000 messages per hour with comprehensive support, that's a software-only cost of more than $24,000 (one million messages an hour would cost over $48,000). Using SSDs and the right configuration, I can do a higher level of performance for less money, hardware and software included. Indeed, mailman would be a key part of that system.
If you want to pay commercial prices, you can get higher levels
of performance and capabilities. But if you're not willing to pay those kinds of prices, you have to make some compromises.
You may rarely get what you pay for, but you almost always pay
for what you get -- sometimes much more than you should.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

Brad Knowles wrote:
Another article, which might be OT to the thread but nevertheless interesting:
http://john.redmood.com/osfastest.html
It is co-authored by a Lyris dude and looks at OS choices and performance-related sub-topics.
- Kevin

At 8:45 AM -0500 2004/02/03, Kevin McCann wrote:
I was wondering when someone would bring that up.
I saw that article. I tore them several new openings, and did
the same for Amber Ankerholtz (publisher of _SysAdmin_) for allowing such garbage to be published in her magazine.
Basically, these guys don't know crap, and they were using the
article as an advertisement for their stuff. They should have stuck to the OSes they know and not bothered with trying to include things that they know nothing about.
I've tried to write decent quality articles for _SysAdmin_ in the
past, but Amber's editorial team really let me down, and I have now sworn off them.
I am working on a book on a different subject, and I've got a
booklet idea in the wings on a more closely related subject.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

On Feb 2, 2004, at 3:27 AM, Brad Knowles wrote:
It's definitely useful and a big win. It both clears up general disk I/O, but more importantly (from what I have seen), moves certain key inodes in the delivery file structure off of disk, and since I/O operations have to lock and unlock them for update, the time wasted single-threading through them goes way down (this is why, for instance, you should generate a fairly large number of sub-queues in sendmail; if you're trying to do volume and haven't, you're being silly; it spreads the load across more than one inode)
participants (10)
-
Barry Warsaw
-
Bob Puff@NLE
-
Brad Knowles
-
Carson Gaspar
-
Chuq Von Rospach
-
Kevin McCann
-
moron
-
moron@industrial.org
-
Nigel Metheringham
-
Somuchfun