Bounce processing observations 2.1.3 versus 2.1.1 for large lists
Greetings:
I currently manage a mailman installation on a Redhat system with a total of 229,000 subs spread fairly evenly over 72 announce only lists (averaging 3k users each) .
Recently I upgraded from 2.1.1 to 2.1.3 primarily because of the fix for the cross site scripting bug but also for the bounce processing improvements.
Prior to the update I would only run BounceRunner every 8 hours because of the large CPU and I/O load it would put on my system (90% CPU, and LOTS of disk I/O).
So far it appears that the 2.1.3 bounce processing software is MUCH faster. In many cases it's able to process up to 15 bounces per SECOND. Fantastic. That is versus 6 bounces per second on 2.1.1 (This is a 2GHZ P4 with ATA100 IDE drives)
However, with this increased performance (no doubt due to the fact that BounceRunner registers MANY bounces for one list at time) comes a problem best illustrated by the following excerpt from my bounce log:
Oct 20 12:29:54 2003 (3706) Processing 1211 queued bounces Oct 20 12:33:04 2003 (3706) bouncingsubscriber1@isp.com: momsviewf current bounce score: 5.0 ..... dozens and dozens more entries for the momsviewf list ..... Oct 20 12:36:10 2003 (3706) bouncingsubscriber2@isp2.com: momsviewf current bounce score: 2.0
If I understand how the processing is done correctly, was the momsviewf list indeed locked for a period of 3 minutes and 6 seconds?
I noticed also that I could not access the momsviewf list via the web admin interface, it would hang on the admin password entry screen.
The momsviewf list has 4374 subs.
I did not notice this issue on 2.1.1. Probably since it releases the list lock after every bounce.
While increasing the performance of bounce processing significantly, the changes in 2.1.3 appear to have created a lock contention issue as a side effect.
My suggested fix, as I previously mentioned in my Jan 30, 2003 posting to this list reproduced below is to purposely LIMIT the number of bounces processed (preferably per list) so that the lock is released in a reasonable time period.
I would be interested in hearing other suggestions or observations about this issue. Thanks John Co-webmaster momsview.com
intialize x to number of bounces to process on each pass While Forever Initialize Python list structure to hold bounces (Process x emails in the bounce queue) For x emails in queue Dequeue the message Extract addresses to bounce SAVE address and Listname in Python list structure If Python List structure contains emails For all mailing lists in Python structure REREAD list from disk LOCK the LIST For all addresses that bounced for this list Register Bounce SAVE the list to disk UNLOCK the list SLEEP for SLEEPTIME CLEANUP on exit
Advantages to this method:
(1) We process a number of bounces before writing out the list reducing I/O (the real bootleneck) by factor x. When x is one the algorithm almost degenerates to the current method
(2) Since we always sleep on each pass it gives other processes (like the Web gui) a chance to read the list.
(3) By increasing x we control the number of bounces that get processed on each pass. The time it takes to extract the addresses gives other processes time to acquire the list lock and avoid "lockout"
(4) Since "in memory" bounce registration is very fast we can do a lot of them while the list is locked without adding significantly to the already long lock time on a big list (I believe the I/O is the limiting factor)
participants (1)
-
John