[Mailman-Developers] (2.0.6) pipermail takes >1 minute to rebuild indexes on large lists

Barry A. Warsaw barry@zope.com
Fri, 12 Oct 2001 16:20:49 -0400

>>>>> "BG" == Ben Gertzfield <che@debian.org> writes:

    BG> The problem was when the mbox got up to about 200-300 megs; I
    BG> can send you the traces of the function calls with timestamps,
    BG> and you can see exactly how slow things get.

>>>>> "BAW" == Barry A Warsaw <barry@zope.com> writes:

    BAW> My biggest lists are python-list at ~280MB followed by the
    BAW> zope mailing list which is at about 150MB, and I've got a
    BAW> dozen in the 10-100MB range.

    BAW> It would be interesting to see some profiler output.

    BG> Here's an example.  There are megs and megs where this came
    BG> from..

[profiling deleted]

    BG> I can explain in more detail, but it's pretty obvious that
    BG> ToArchive starts to thrash pretty badly with a big mbox file.

I think you need to investigate this more.  I'd like to see exactly
how you instrumented ToArchive.py to get these numbers.  I think
something else is going on with your system.

Here's what I did: I took python-list.mbox from mail.python.org.  This
is about 280MB.  I installed that as the mbox file for a local test
list, and ran bin/arch on it to initialize the archive.

Then I instrumented MM2.0.6's ToArchive.py like so:

        # TBD: this needs to be converted to the new pipeline machinery
	t0 = time.time()
        mlist.ArchiveMail(msg, msgdata)
	t1 = time.time()
	syslog('debug', 'ArchiveMail time: %s seconds' % (t1 - t0))

On an unloaded system, this took 1.08 seconds.  Much less than the
53 seconds between these two lines in your output:

-------------------- snip snip --------------------
Sep 13 19:38:06 2001 (29462) done writing dirty/new msg to disk
Sep 13 19:38:59 2001 (29454) done with handler func ToArchive.
-------------------- snip snip --------------------

When I send 3 or 4 messages into the queue at the same time, the
average time in ArchiveMail() is 0.2 seconds.  I could try
instrumenting ToArchive.py on the live site, but I suspect I'll get
very similar numbers.

Also, your output implies there's some forking going on.  Where's that
happening?  The only forking the MM2.0.6 code base does is in the
ToUsenet.py handler (oh and the test cases for LockFile.py but that
obviously doesn't count).