[Mailman-Users] Problem with archrunner using large %'s of cpu (read faq & archives)

Richard Barrett r.barrett at openinfo.co.uk
Fri Oct 31 22:35:24 CET 2003


On Friday, October 31, 2003, at 08:52  pm, Scott Lambert wrote:

> On Fri, Oct 31, 2003 at 09:40:11AM -0500, Jon Carnes wrote:
>> On Fri, 2003-10-31 at 09:26, Jay West wrote:
>>> I'm using Mailman 2.1.2 on FreeBSD v4.8-Release, built using the 
>>> port. MTA
>>> is sendmail 8.12.8p1
>>>
>>> Very frequently I will see the ArchRunner process using 99+ % of 
>>> cpu. I have
>>> searched the archives and found lots of messages about qrunners 
>>> using large
>>> percentages of cpu, but they all seem to talk about the fixes being 
>>> related
>>> to actual mail processing (sendmail), not archRunner. I am assuming 
>>> that if
>>> the problem was mail delivery or reception I would be seeing the 
>>> large cpu
>>> use on a different qrunner process. My issue is specific to the 
>>> archrunner
>>> process which I don't find much on in the archives/faq.
>>>
>> Well you've pegged it.  That was a bug in version 2.1.2 which is fixed
>> in 2.1.3.  The patch for 2.1.2 should still be available - you could
>> probably patch your running system and just leave it at that (an 
>> upgrade
>> will bring the patch in anyway).
>
> I still see this problem with Mailman 2.1.3 for a high-volume list.
>
>   PID USERNAME PRI NICE  SIZE    RES STATE  C   TIME   WCPU    CPU 
> COMMAND
> 66428 mailman   64   0   168M   147M CPU1   0 376.7H 99.02% 99.02% 
> python2.3
>
> That's the archiver process.  There are 1318 messages in the archive
> queue...
>
> 12:00:28 Fri Oct 31 # truss -p 66428
> break(0x114f6000)                                = 0 (0x0)
> break(0x1302c000)                                = 0 (0x0)
> break(0x114f8000)                                = 0 (0x0)
> break(0x13030000)                                = 0 (0x0)
> break(0x114fa000)                                = 0 (0x0)
> break(0x13034000)                                = 0 (0x0)
> break(0x114fc000)                                = 0 (0x0)
> break(0x13038000)                                = 0 (0x0)
> break(0x114fe000)                                = 0 (0x0)
> break(0x1303c000)                                = 0 (0x0)
> break(0x11500000)                                = 0 (0x0)
> break(0x13040000)                                = 0 (0x0)
> break(0x11502000)                                = 0 (0x0)
> break(0x13044000)                                = 0 (0x0)
> break(0x11504000)                                = 0 (0x0)
> break(0x13048000)                                = 0 (0x0)
> break(0x11506000)                                = 0 (0x0)
> break(0x1304c000)                                = 0 (0x0)
>
> Once I kill off the mailman queue runners and clean up the several lock
> files for this mailing list, it runs just fine and manages to empty the
> archive queue.
>
> Two days worth of mailman cron jobs were still stuck in the process 
> list.
>
> Supposition: Maybe they were blocked by the list's lockfile?
>
> So, it seems that the archRunner process went off the deep end 
> somewhere
> between two and three days ago.
>
> I have the htdig patches for 2.1.3 installed.  Which might be 
> germane...

If you are referring to patch #444884 then, while I would never say 
never, it is not highly likely to be the cause. The code inserted  by 
patch #444884 impinges very little on the execution path taken when 
mail is being archived and archive pages are being generated by 
pipermail. If you discover any different let me know and I'll take 
another look at the htdig integration patch.

You say you have the problem with a high volume list.  What sort of 
message sizes and traffic volume is the list handling? Do the messages 
tend to have large attachments? I have found that the internal 
pipermail archiver starts to choke on high volume lists and on a least 
one of them I run the solution I adopted was to reduce the archiving 
period from a month to a week, which seemed to alleviate the problem. I 
suspect the problem is partially related to the pickled data structures 
that pipermail uses to control archiver operation and index generation.

I'm now using a fairly tight Mailman/MHonArc integration for such 
lists; I developed it because MHonArc has a reputation for handling 
large archives better than pipermail but I still wanted MM list archive 
privacy, my htdig integration, etc. A patch for this is available at 
http://www.openinfo.co.uk/mailman/patches/mhonarc/index.html or as MM 
patch #820723 on sourceforge. It subcontracts MHonArc to generate the 
message and period index pages in the normal 
$prefix/archives/private/<listname>/<archive-period> directory 
structure while the pipermail/MM code looks after the top level index, 
archive control and access control. The integration makes the choice of 
pipermail or MHonArc a per-list option so if you change your mind or 
decide it was all a big mistake it is not a disaster; select the 
archiver of choice and run $prefix/bin/arch --wipe to have the archiver 
of choice regenerate the list archive from the its mbox file.

So far this MM/MH integration has worked OK for me but that's a single 
data point.

Enough over-selling of a free product and the usual caveat emptor :) 
but if you give it a try let me know how you get on.

>
> -- 
> Scott Lambert                    KC5MLE                       Unix 
> SysAdmin
> lambert at lambertfam.org
-----------------------------------------------------------------------
Richard Barrett                               http://www.openinfo.co.uk





More information about the Mailman-Users mailing list