[Mailman-Users] Problem with archrunner using large %'s of cpu (read faq & archives)
Richard Barrett
r.barrett at openinfo.co.uk
Fri Oct 31 22:35:24 CET 2003
On Friday, October 31, 2003, at 08:52 pm, Scott Lambert wrote:
> On Fri, Oct 31, 2003 at 09:40:11AM -0500, Jon Carnes wrote:
>> On Fri, 2003-10-31 at 09:26, Jay West wrote:
>>> I'm using Mailman 2.1.2 on FreeBSD v4.8-Release, built using the
>>> port. MTA
>>> is sendmail 8.12.8p1
>>>
>>> Very frequently I will see the ArchRunner process using 99+ % of
>>> cpu. I have
>>> searched the archives and found lots of messages about qrunners
>>> using large
>>> percentages of cpu, but they all seem to talk about the fixes being
>>> related
>>> to actual mail processing (sendmail), not archRunner. I am assuming
>>> that if
>>> the problem was mail delivery or reception I would be seeing the
>>> large cpu
>>> use on a different qrunner process. My issue is specific to the
>>> archrunner
>>> process which I don't find much on in the archives/faq.
>>>
>> Well you've pegged it. That was a bug in version 2.1.2 which is fixed
>> in 2.1.3. The patch for 2.1.2 should still be available - you could
>> probably patch your running system and just leave it at that (an
>> upgrade
>> will bring the patch in anyway).
>
> I still see this problem with Mailman 2.1.3 for a high-volume list.
>
> PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU
> COMMAND
> 66428 mailman 64 0 168M 147M CPU1 0 376.7H 99.02% 99.02%
> python2.3
>
> That's the archiver process. There are 1318 messages in the archive
> queue...
>
> 12:00:28 Fri Oct 31 # truss -p 66428
> break(0x114f6000) = 0 (0x0)
> break(0x1302c000) = 0 (0x0)
> break(0x114f8000) = 0 (0x0)
> break(0x13030000) = 0 (0x0)
> break(0x114fa000) = 0 (0x0)
> break(0x13034000) = 0 (0x0)
> break(0x114fc000) = 0 (0x0)
> break(0x13038000) = 0 (0x0)
> break(0x114fe000) = 0 (0x0)
> break(0x1303c000) = 0 (0x0)
> break(0x11500000) = 0 (0x0)
> break(0x13040000) = 0 (0x0)
> break(0x11502000) = 0 (0x0)
> break(0x13044000) = 0 (0x0)
> break(0x11504000) = 0 (0x0)
> break(0x13048000) = 0 (0x0)
> break(0x11506000) = 0 (0x0)
> break(0x1304c000) = 0 (0x0)
>
> Once I kill off the mailman queue runners and clean up the several lock
> files for this mailing list, it runs just fine and manages to empty the
> archive queue.
>
> Two days worth of mailman cron jobs were still stuck in the process
> list.
>
> Supposition: Maybe they were blocked by the list's lockfile?
>
> So, it seems that the archRunner process went off the deep end
> somewhere
> between two and three days ago.
>
> I have the htdig patches for 2.1.3 installed. Which might be
> germane...
If you are referring to patch #444884 then, while I would never say
never, it is not highly likely to be the cause. The code inserted by
patch #444884 impinges very little on the execution path taken when
mail is being archived and archive pages are being generated by
pipermail. If you discover any different let me know and I'll take
another look at the htdig integration patch.
You say you have the problem with a high volume list. What sort of
message sizes and traffic volume is the list handling? Do the messages
tend to have large attachments? I have found that the internal
pipermail archiver starts to choke on high volume lists and on a least
one of them I run the solution I adopted was to reduce the archiving
period from a month to a week, which seemed to alleviate the problem. I
suspect the problem is partially related to the pickled data structures
that pipermail uses to control archiver operation and index generation.
I'm now using a fairly tight Mailman/MHonArc integration for such
lists; I developed it because MHonArc has a reputation for handling
large archives better than pipermail but I still wanted MM list archive
privacy, my htdig integration, etc. A patch for this is available at
http://www.openinfo.co.uk/mailman/patches/mhonarc/index.html or as MM
patch #820723 on sourceforge. It subcontracts MHonArc to generate the
message and period index pages in the normal
$prefix/archives/private/<listname>/<archive-period> directory
structure while the pipermail/MM code looks after the top level index,
archive control and access control. The integration makes the choice of
pipermail or MHonArc a per-list option so if you change your mind or
decide it was all a big mistake it is not a disaster; select the
archiver of choice and run $prefix/bin/arch --wipe to have the archiver
of choice regenerate the list archive from the its mbox file.
So far this MM/MH integration has worked OK for me but that's a single
data point.
Enough over-selling of a free product and the usual caveat emptor :)
but if you give it a try let me know how you get on.
>
> --
> Scott Lambert KC5MLE Unix
> SysAdmin
> lambert at lambertfam.org
-----------------------------------------------------------------------
Richard Barrett http://www.openinfo.co.uk
More information about the Mailman-Users
mailing list