[Mailman-Developers] Re: [Mailman-Users] speeding up archiver in mailman 2.1b3?

Fri Oct 25 21:28:09 2002

I've actually got over 3000 files in the archive queue now :)
But, setting ARCHIVE_TO_MBOX = 1 (skipping pipermail archiving), seems to 
speed things up vastly (at least, running qrunner -r Arch -o actually 
processes one message and takes less than 1 second (as opposed to over 5 
minutes)).  Running it right now without -o (endless loop) and it's 
chugging along at a wonderful pace.  Just finished in about 10 seconds.  So 
there ya go.  It's definately pipermail archiving that's the very expensive 
operation.
I can always run bin/arch to build the pipermail archive after the fact.  I 
wanted to look into using mhonarc anyhow..
I'll look into fixing mailman too ;)
Thanks very much for your help.

--
Andrew Clark
Campus Network Programmer
Office of Information Technology
University of California, Santa Barbara
andrew.clark@ucsb.edu (805) 893-5311

--On Friday, October 25, 2002 16:15:06 -0400 "Barry A. Warsaw" 
<barry@python.org> wrote:

>
>>>>>> "ADC" == Andrew D Clark <andrew.clark@ucsb.edu> writes:
>
>     ADC> Since I'm hopelessly backlogged in my archive queue (1644
>     ADC> files), does anyone have any suggestions for speeding up
>     ADC> archiving?  The qrunner process is certainly eating up CPU
>     ADC> and memory, but is only archiving about 1 msg per minute.
>     ADC> All the other queues move at a decent pace.
>
> Here's a thought, if you're interested in hacking some code.
>
> In Mailman/Archiver/Archiver.py, ArchiveMail() we create a new
> HyperArchive instance each time we want to add a new message to the
> archive.  That in turn creates a new HyperDatabase instance, which in
> turn un-marshals all the state of the archiver.
>
> I wonder if it wouldn't make more sense if the archiver stored the
> HyperArchive instance on self and re-used it.  That might save a lot
> of i/o, although I don't know if it would help much with overall
> performance, and I don't know if Pipermail would still operate
> correctly.  It's worth a shot.
>
> Another idea would to change the scheme the Pipermail archiver used
> from a one-file-per-message scheme to a Unix mailbox scheme.  The
> basic idea would be for ToArchive.py to append the message to a Unix
> mbox, and then have ArchRunner.py slurp a multi-message mbox into the
> archive instead of doing one message at a time.
>
> I don't have time to play with these ideas, so I'm cc'ing
> mailman-developers, in case anyone wants to do some hacking and
> profiling.
>
> -Barry