[Mailman-Developers] (2.0.6) pipermail takes >1 minute to rebuild indexes on large lists

Ben Gertzfield che@debian.org
Wed, 10 Oct 2001 14:44:08 +0900


>>>>> "BAW" == Barry A Warsaw <barry@zope.com> writes:
>>>>> "BG" == Ben Gertzfield <che@debian.org> writes:

    BG> The problem was when the mbox got up to about 200-300 megs; I
    BG> can send you the traces of the function calls with timestamps,
    BG> and you can see exactly how slow things get.

    BAW> My biggest lists are python-list at ~280MB followed by the
    BAW> zope mailing list which is at about 150MB, and I've got a
    BAW> dozen in the 10-100MB range.

    BAW> You're sure you're not gzipping on the fly, right?

Absolutely.

[ben@yuubin:/usr/lib/mailman/Mailman]% grep -i gzip Defaults.py          2:40PM
# Set this to 1 to enable gzipping of the downloadable archive .txt file.
# night to generate the txt.gz file.  See cron/nightly_gzip for details.
GZIP_ARCHIVE_TXT_FILES = 0
[ben@yuubin:/usr/lib/mailman/Mailman]% grep -i gzip mm_cfg.py            2:40PM

    BAW> It would be interesting to see some profiler output.

Here's an example.  There are megs and megs where this came from..

Sep 13 19:38:02 2001 (29454) pipelining: ToArchive
Sep 13 19:38:02 2001 (29454) forking...
Sep 13 19:38:02 2001 (29454) forked, pid 29454. calling handler func ToArchive...
Sep 13 19:38:04 2001 (29458) in Message.enqueue() now
Sep 13 19:38:04 2001 (29458) opening file: 733417dfede9cc5f09bf35f40d6c3d279830f653
Sep 13 19:38:04 2001 (29458) opening db /var/lib/mailman/qfiles/733417dfede9cc5f09bf35f40d6c3d279830f653.db
Sep 13 19:38:04 2001 (29458) exception in msg
Sep 13 19:38:04 2001 (29458) msgdata.update newdata
Sep 13 19:38:04 2001 (29458) msgdata.update kws
Sep 13 19:38:04 2001 (29458) writing data file
Sep 13 19:38:04 2001 (29458) done writing data file
Sep 13 19:38:04 2001 (29458) writing dirty/new msg to disk
Sep 13 19:38:04 2001 (29458) done writing dirty/new msg to disk
Sep 13 19:38:06 2001 (29462) in Message.enqueue() now
Sep 13 19:38:06 2001 (29462) opening file: 4a2589b46405fdf1691bb83cba6d638e718b932a
Sep 13 19:38:06 2001 (29462) opening db /var/lib/mailman/qfiles/4a2589b46405fdf1691bb83cba6d638e718b932a.db
Sep 13 19:38:06 2001 (29462) exception in msg
Sep 13 19:38:06 2001 (29462) msgdata.update newdata
Sep 13 19:38:06 2001 (29462) msgdata.update kws
Sep 13 19:38:06 2001 (29462) writing data file
Sep 13 19:38:06 2001 (29462) done writing data file
Sep 13 19:38:06 2001 (29462) writing dirty/new msg to disk
Sep 13 19:38:06 2001 (29462) done writing dirty/new msg to disk
Sep 13 19:38:59 2001 (29454) done with handler func ToArchive.

I can explain in more detail, but it's pretty obvious that ToArchive
starts to thrash pretty badly with a big mbox file.

    BAW> I feel it'll tie us to closely to some other project, with
    BAW> its own agenda, schedule, compatibility issues, tool chain,
    BAW> etc. etc.  I'm under no illusions about making Pipermail a
    BAW> killer archiver, but I also don't think that most sites need
    BAW> much more.  I'd rather give folks a moderately useful,
    BAW> bundled archiver and tell them where to go if they're running
    BAW> a high traffic site.

If we go this route, we must do a big overhaul on pipermail.  It
tries to do way too much as it is, and fails spectacularly on
systems other than mine when the mbox file gets too big.

Ben

-- 
Brought to you by the letters Y and P and the number 12.
"Porcoga daisuki!"
Debian GNU/Linux maintainer of Gimp and GTK+ -- http://www.debian.org/