[ mailman-Bugs-1168999 ] OutgoingRunner gets in expensive recursive loop

Tue Apr 26 09:30:36 CEST 2005

Bugs item #1168999, was opened at 2005-03-23 18:30
Category: mail delivery
Group: 2.1 beta
Status: Open
Resolution: None
Priority: 5
Submitted By: Kim Davies (kjd)
Assigned to: Nobody/Anonymous (nobody)
Summary: OutgoingRunner gets in expensive recursive loop

Initial Comment:
I have had a problem spring up the past few weeks where
the OutgoingRunner gets in a loop which effectively
brings down the machine by spiking the CPU to 99%.
Running "strace" on the process I see it constantly
deleting and reimplanting the same queue file in
qfiles/out/ over and over, many times per second.

The initial problem inurred with a 2.1.2 install, but
installing 2.1.6b4 shows the same.

Unfortunately the problem is somewhat ephemeral when
trying to diagnose it - if I manage to kill the
OutgoingRunner between a read and write, the queue file
gets lost the the problem disappears for a while. 

I don't know if it is useful, but attached is the
strace output of a
complete read/write cycle. I haven't had the
opportunity to further debug it (by stepping through
the python) as currently I am not in this state. I am
not sure how long it will be until it is triggered
again, but it has happened about 4 times in the past
two weeks. It has never occured before this over 3 years.

I consider this issue fairly problematic - the machine
becomes unusable when it reaches this state due to CPU

Any tips of helping isolate the problem are welcome. I
have modified mailmanctl to run all queuerunners with a
verbose flag, so next time maybe there will be useful
information logged.


>Comment By: Kim Davies (kjd)
Date: 2005-04-26 15:30

Logged In: YES 

I have managed to captures a number of qfiles that are
causing this phenomenon (which is recurring more often the
past few weeks). They have the following properties:

- They are all "Post by non-member to a members-only list
" responses to spam that has gone to a moderated list.
- They come from non-existant domains.

Here is a sample dump of one of the .db files from the queue
that is looping:

$ /usr/local/mailman/bin/dumpdb
{   'deliver_after': 1114503867.0899999,
    'deliver_until': 1114932267.0899999,
    'lang': 'en',
    'last_recip_count': 1,
    'listname': 'ga',
    'nodecorate': 1,
    'original_sender': 'ga-bounces at lists.centr.org',
    'personalize': 1,
    'pipeline': [],
    'received_time': 1114500259.9807329,
    'recips': ['sidfks at sklfislxkd.com'],
    'reduced_list_headers': 1,
    'verp': 1,
    'version': 3}


