New subject: Huge lists

May 25, 2000 · *real*


      On Wed, 24 May 2000 20:47:21 -0700
Chuq Von Rospach <chuqui@plaidworks.com> wrote:
...
At 6:38 PM -0700 5/24/2000, J C Lawrence wrote:
...
...
True.  My curiosity however is what MTA's do MX sorting, and more
particularly, MX collapsing (eg for two different targets that
share an MX's among their lowest level).  The potential gains
there are likely not huge, but could be (guesstimate) noticable
for high volume servers with broad standard deviations in their
target lists.
I'll have to check into that some time.
...
but -- as the experts say, the first $500 buys you 90% of the
stereo response, and the rest of the money goes into getting you
as close to 100% as you can get.
Umm, true.  Looking at it again, and doing a quick check of my user
base's MXing, I suspect we're dealing with a less than 1% gain.
Bigger fish are available.  Methinks my brain was farting.
...
MX sorting is definitely far up into that 90% range,
computationally and time expensive, and lots of other stuff can be
done first, with more gain, and less effort.
I don't believe that a list server has any business handling MX
sorting unless it is also taking responsibility for being the list
MTA.  As Mailman isn't, its a moot point.
...
Maybe one thing we need is a definition of what Mailman is and
what it isn't. Some kind of target for the size of lists it wants
to reasonably support. If it's 5,000 users, it doesn't matter what
you do. If it's 50,000, or 500,000, you definitely have different
requirements.
While I really have no say here, were I Barry and Co I'd be
comfortable with targetting Mailman as able to handle a mid/high
6digit subscriber base list on mid-range PC-class hardware given
suitable system configuration.  That wouldn't be the target of
course, just the "it must be physically able to work here" metric.
...
(being able to handle a moderately busy 25,000 user list, say
15-30 messages a day...
Average traffic levels are never the problem.  Its the bursts you
have to worry about, especially given the enforced latency of a
moderated list and there resultant likely grouping of broadcasts.  I
usually end up moderating/approving messages in groups of 5 making
bursts of ~5K messages to the MTA (current largest list has a little
under 1K subscribers).  It is the burst aspect that's possibly the
main reason the MTA delivery process needs to be made asynchronous
from the rest of the list server.
...
It'd be nice to be able to say "5 million subscribers in 2
minutes!", but focus on a solid "do most things for most folks"
now, and add the high performance/huge list support in 2.5. But
leave the hooks in, so we don't have to rewrite later....)
Were delivery to the MTA seperated from the receipt or CGI process
(ie mail is received, the RCPT list attached to it, and the tuple
placed on a queue for background processing via forked process or
cron job), we wouldn't be having this discussion.  Its a fairly
invasive change to the current Mailman architecture, but making the
whole reciept/broadcast aspect asynchronous offers some really
pleasant future avenues.
...
...
While its a cheap logic, its easy to note that none of the very
high volume commercial email sites out there are based on
Sendmail (Critical Path, Hotmail, Onelist, EGroups, etc).
...
Valid points.
<chortle>
...
Postfix looks like a *real* win, but until I run it through its
paces, I won't use it.
Exactly where I'm at on it.  I'm about to roll my desktops over to
it, and let it stew there for a couple weeks.
...
I'm doing 400-500,000 an hour out of my mail system without trying
too hard, using sendmail 8.9.3, and peaks approaching 900K.
My traffic on Kanga.Nu (hobby lists: http://www.kanga.nu/lists/listinfo/)
is bursty and low enough that I just never get any hours with
solidly active spools.  I average around 30K - 40K deliveries per
hour with the MTA sitting idle for much of that hour.  96% of
messages are delivered within 60 seconds of hitting the queue, 98.1%
within the hour -- we're basically talking a pretty idle mail
system.
...
So eeking out more performance by swapping MTAs is not a priority)
That's one of the main reasons I've been so lackadaisical about
moving to Postfix -- I don't really need to.  The only thing driving
it is my own interest.
...
...
...
As someone who deals with email for a living...
I should probably note at this point that I'm working for
Critical Path on their mail systems.
NB as a contractor.
...
As long as we're into disclosure, I run a bunch of hobby lists at
plaidworks.com...
Just been poking around there and noticed that your archives seem to
be inop (dead disk).  If you're interested I've been messing about
with MHonArc and PHP in my spare time and have almost finished
getting a setup that:
-- Allows archived messages to be replied to on the web via the
archive page (replies post to the list).
-- Templates (PHPLIB) the entire archive appearance.  All MHonArc
does is the parsing and data extraction.
-- Supports archive searching by MessageID.  I've an MTA hack that
inserts a MessageID-based URL into all outgoing Mailman
list traffic so the user can just hit the URL and be taken to
that message in the archives (searches the MHonArc DB, useful
for thread reference etc).
Hopefully I'll get something worth public viewing sometime next
week.
--
J C Lawrence                                 Home: claw@kanga.nu
----------(*)                              Other: coder@kanga.nu
--=| A man is as sane as he is dangerous to his environment |=--

Re: [Mailman-Developers] Huge lists

J C Lawrence

Chuq Von Rospach

bwarsaw＠python.org

Thomas Wouters

Nigel Metheringham

Chuq Von Rospach

bwarsaw＠python.org

Thomas Wouters

Nigel Metheringham

tags

participants (5)