[Mailman-Developers] (2.0.6) pipermail takes >1 minute to rebuild indexes on large lists

Barry A. Warsaw barry@zope.com
Wed, 10 Oct 2001 00:04:56 -0400


>>>>> "BG" == Ben Gertzfield <che@debian.org> writes:

    BG> The problem was when the mbox got up to about 200-300 megs; I
    BG> can send you the traces of the function calls with timestamps,
    BG> and you can see exactly how slow things get.

My biggest lists are python-list at ~280MB followed by the zope
mailing list which is at about 150MB, and I've got a dozen in the
10-100MB range.

You're sure you're not gzipping on the fly, right?

It would be interesting to see some profiler output.

    BAW> If your system still can't handle things, then the next step
    BAW> is to set ARCHIVE_TO_MBOX to 1.  This way, Mailman will
    BAW> simply append the message to the .mbox file, which ought to
    BAW> be extremely quick, but it won't attempt to run the Pipermail
    BAW> archiver in real time.  Then you can use whatever archiving
    BAW> scheme you want (e.g. bin/arch nightly, or an external
    BAW> archiver).

    BG> Yes, this is probably the right solution.  In fact, I'm
    BG> actually leaning towards suggesting that Mailman just come
    BG> with or depend upon hypermail for archiving; we're just
    BG> re-inventing the wheel by trying to modify pipermail over and
    BG> over, and it's really not going to scale.

So far, I've resisted this.  I've no problem recommending an external
archiver for serious sites, and making it as easy as possible to
integrate Mailman with external archivers, but I really don't want to
distribute one with Mailman.

I feel it'll tie us to closely to some other project, with its own
agenda, schedule, compatibility issues, tool chain, etc. etc.  I'm
under no illusions about making Pipermail a killer archiver, but I
also don't think that most sites need much more.  I'd rather give
folks a moderately useful, bundled archiver and tell them where to go
if they're running a high traffic site.

-Barry