Mailman 3 Bounce processing observations 2.1.3 versus 2.1.1 for large lists - Mailman-Developers

22 Oct 2003

      Greetings:
I currently manage a mailman installation on a Redhat system with a total of 229,000 subs spread fairly evenly over 72 announce only lists (averaging 3k users each) .
Recently I upgraded from 2.1.1 to 2.1.3 primarily because of the fix for the cross site scripting bug but also for the bounce processing improvements.
Prior to the update I would only run BounceRunner every 8 hours because of the large CPU and I/O load it would put on my system (90% CPU, and LOTS of disk I/O).
So far it appears that the 2.1.3 bounce processing software is MUCH faster.  In many cases it's able to process up to 15 bounces per SECOND.  Fantastic.
That is versus 6 bounces per second on 2.1.1  (This is a 2GHZ P4 with ATA100 IDE drives)
However, with this increased performance (no doubt due to the fact that BounceRunner registers MANY bounces for one list at time) comes a problem best illustrated by the following
excerpt from my bounce log:
Oct 20 12:29:54 2003 (3706) Processing 1211 queued bounces
Oct 20 12:33:04 2003 (3706) bouncingsubscriber1@isp.com: momsviewf current bounce score: 5.0
.....
dozens and dozens more entries for the momsviewf list
.....
Oct 20 12:36:10 2003 (3706) bouncingsubscriber2@isp2.com: momsviewf current bounce score: 2.0
If I understand how the processing is done correctly, was the momsviewf list indeed locked for a period of 3 minutes and 6 seconds?
I noticed also that I could not access the momsviewf list via the web admin interface, it would hang on the admin password entry screen.
The momsviewf list has 4374 subs.
I did not notice this issue on 2.1.1. Probably since it releases the list lock after every bounce.
While increasing the performance of bounce processing significantly, the changes in 2.1.3 appear to have created a lock contention issue as a side effect.
My suggested fix, as I previously mentioned in my Jan 30, 2003 posting to this list reproduced below is to purposely LIMIT the number of bounces processed (preferably per list) so that the
lock is released in a reasonable time period.
I would be interested in hearing other suggestions or observations about this issue.
Thanks
John
Co-webmaster
momsview.com

intialize x to number of bounces to process on each pass
While Forever
Initialize Python list structure to hold bounces
(Process x emails in the bounce queue)
For x emails in queue
Dequeue the message
Extract addresses to bounce
SAVE address and Listname in Python list structure
If Python List structure contains emails
For all mailing lists  in Python structure
REREAD list from disk
LOCK the LIST
For all addresses that bounced for this list
Register Bounce
SAVE the list to disk
UNLOCK the list
SLEEP for SLEEPTIME
CLEANUP on exit
Advantages to this method:
(1) We process a number of bounces before writing out the list reducing I/O
(the real bootleneck) by factor x.  When x is one the algorithm almost
degenerates to the current method
(2) Since we always sleep on each pass it gives other processes (like the
Web gui) a chance to read the list.
(3) By increasing x we control the number of bounces that get processed on
each pass. The time it takes to extract the addresses gives other processes
time to acquire the list lock and avoid "lockout"
(4) Since "in memory" bounce registration is very fast we can do a lot of
them while the list is locked without adding significantly to the already
long lock time on a big list (I believe the I/O is the limiting factor)

Bounce processing observations 2.1.3 versus 2.1.1 for large lists

John

tags

participants (1)