Re: [Mailman-Users] Messages remain in spool/mailman/in

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Sebastian Hagedorn wrote: | | --On 23. April 2008 18:28:44 -0700 Mark Sapiro <mark@msapiro.net> wrote: | |>> Update: I just noticed that the number of messages in "in" is |>> decreasing. |>> Could it be that something just overwhelmed Mailman so that now there is |>> a backlog that's slowly being worked through? I don't think we ever had |>> such issues prior to today. |> |> There are two possibilities. One is that your IncomingRunner is sliced | | It's not. I didn't even know that feature until just now. Perhaps I | should look into that. | |> The other is that somehow you are working through a flood of messages, |> or some problem (maybe locks - see |> <http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq04.076.htp>) |> is making IncomingRunner run slowly. | | Well, there were really some locking issues, see below. | |> A backlogged IncomingRunner is |> rare. It's more usual to see a backlogged OutgoingRunner due to delays |> in SMTP. |> |> Check all of Mailman's logs for any clues. | | Let's start with mmdsr: | | Hourly Summary of Posts
Note that the 'post' log which is the source of these data is written when the message is sent, not when it arrives.
| ----------------------- | <snip> | 126 11:00-11:59 | 97 12:00-12:59 | 121 13:00-13:59 | 1355 14:00-14:59 | 760 15:00-15:59 | 617 16:00-16:59 | 1293 17:00-17:59 | 207 18:00-18:59 | <snip> | | So starting around 2pm there was an unusually high number of posts. On | other days there are never more than about 150 messages per hour. It | turns out that one list by itself hat 3,518 posts yesterday. And that | list had locking issues: | | Apr 23 16:28:35 2008 (13393) jura-pruefungsamt.lock lifetime has | expired, breaking | Apr 23 16:28:35 2008 (13393) File "/usr/lib/mailman/bin/qrunner", line | 278, in ? | Apr 23 16:28:35 2008 (13393) main() | Apr 23 16:28:35 2008 (13393) File "/usr/lib/mailman/bin/qrunner", line | 238, in main | Apr 23 16:28:35 2008 (13393) qrunner.run() | Apr 23 16:28:35 2008 (13393) File | "/usr/lib/mailman/Mailman/Queue/Runner.py", line 71, in run | Apr 23 16:28:35 2008 (13393) filecnt = self._oneloop() | Apr 23 16:28:35 2008 (13393) File | "/usr/lib/mailman/Mailman/Queue/Runner.py", line 113, in _oneloop | Apr 23 16:28:35 2008 (13393) self._onefile(msg, msgdata) | Apr 23 16:28:35 2008 (13393) File | "/usr/lib/mailman/Mailman/Queue/Runner.py", line 171, in _onefile | Apr 23 16:28:35 2008 (13393) keepqueued = self._dispose(mlist, msg, | msgdata) | Apr 23 16:28:35 2008 (13393) File | "/usr/lib/mailman/Mailman/Queue/VirginRunner.py", line 38, in _dispose | Apr 23 16:28:35 2008 (13393) return IncomingRunner._dispose(self, | mlist, msg, msgdata) | Apr 23 16:28:35 2008 (13393) File | "/usr/lib/mailman/Mailman/Queue/IncomingRunner.py", line 115, in _dispose | Apr 23 16:28:35 2008 (13393) mlist.Lock(timeout=mm_cfg.LIST_LOCK_TIMEOUT) | Apr 23 16:28:35 2008 (13393) File | "/usr/lib/mailman/Mailman/MailList.py", line 161, in Lock | Apr 23 16:28:35 2008 (13393) self.__lock.lock(timeout) | Apr 23 16:28:35 2008 (13393) File | "/usr/lib/mailman/Mailman/LockFile.py", line 306, in lock | Apr 23 16:28:35 2008 (13393) important=True) | Apr 23 16:28:35 2008 (13393) File | "/usr/lib/mailman/Mailman/LockFile.py", line 416, in __writelog | Apr 23 16:28:35 2008 (13393) traceback.print_stack(file=logf) | Apr 23 16:51:54 2008 (13389) jura-pruefungsamt.lock unexpected | linkcount: -1 | Apr 23 16:51:54 2008 (13389) File "/usr/lib/mailman/bin/qrunner", line | 278, in ? | Apr 23 16:51:54 2008 (13389) main() | Apr 23 16:51:54 2008 (13389) File "/usr/lib/mailman/bin/qrunner", line | 238, in main | Apr 23 16:51:54 2008 (13389) qrunner.run() | Apr 23 16:51:54 2008 (13389) File | "/usr/lib/mailman/Mailman/Queue/Runner.py", line 71, in run | Apr 23 16:51:54 2008 (13389) filecnt = self._oneloop() | Apr 23 16:51:54 2008 (13389) File | "/usr/lib/mailman/Mailman/Queue/Runner.py", line 113, in _oneloop | Apr 23 16:51:54 2008 (13389) self._onefile(msg, msgdata) | Apr 23 16:51:54 2008 (13389) File | "/usr/lib/mailman/Mailman/Queue/Runner.py", line 171, in _onefile | Apr 23 16:51:54 2008 (13389) keepqueued = self._dispose(mlist, msg, | msgdata) | Apr 23 16:51:54 2008 (13389) File | "/usr/lib/mailman/Mailman/Queue/IncomingRunner.py", line 115, in _dispose | Apr 23 16:51:54 2008 (13389) mlist.Lock(timeout=mm_cfg.LIST_LOCK_TIMEOUT) | Apr 23 16:51:54 2008 (13389) File | "/usr/lib/mailman/Mailman/MailList.py", line 161, in Lock | Apr 23 16:51:54 2008 (13389) self.__lock.lock(timeout) | Apr 23 16:51:54 2008 (13389) File | "/usr/lib/mailman/Mailman/LockFile.py", line 287, in lock | Apr 23 16:51:54 2008 (13389) self.__linkcount(), important=True) | Apr 23 16:51:54 2008 (13389) File | "/usr/lib/mailman/Mailman/LockFile.py", line 416, in __writelog | Apr 23 16:51:54 2008 (13389) traceback.print_stack(file=logf) | | It turns out that our law school department informed all their students | of some news. They did so by sending separate mails to each student, | each time Cc'ing themselves on their Mailman list ...
Ouch!
| I will talk to | them about that, but still I'd like to know how to avoid such a | situation. Would slices have helped here?
No. Slicing would make things worse by increasing lock contention. Since all the posts were for one list, and since the list is locked by IncomingRunner for the entire duration of it's processing of a message, slicing wouldn't help. It normally helps by increasing parallelism, but only if the load is across multiple lists so the the sliced runners aren't blocking each other.
In this case, IncomingRunner was doing the best that could be done. I see that VirginRunner was involved in the locking, so perhaps these messages were being held and moderator notices were being generated. Then perhaps a moderator went to the admindb interface, which can keep a list locked for a long time if it is processing a large list of requests.
Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32)
iD8DBQFIEJtWVVuXXpU7hpMRAl4yAKDJ9yg2toE5zzDk8kk+e6f71hgW/QCcDnRN AXHyImxm5q4nkvdkV3yjfdc= =fb+L -----END PGP SIGNATURE-----

--On 24. April 2008 07:38:14 -0700 Mark Sapiro <mark@msapiro.net> wrote:
I'd like to know how to avoid such a situation. Would slices have helped here?
No. Slicing would make things worse by increasing lock contention.
OK, thanks.
Since all the posts were for one list, and since the list is locked by IncomingRunner for the entire duration of it's processing of a message, slicing wouldn't help. It normally helps by increasing parallelism, but only if the load is across multiple lists so the the sliced runners aren't blocking each other.
With only 2 to 3 messages per minute under normal conditions that doesn't seem to be necessary for us.
In this case, IncomingRunner was doing the best that could be done. I see that VirginRunner was involved in the locking, so perhaps these messages were being held and moderator notices were being generated. Then perhaps a moderator went to the admindb interface, which can keep a list locked for a long time if it is processing a large list of requests.
Ah, I wasn't even aware of that. Anyway, they promised not do it like that anymore, so I guess we're fine. Thanks for your help!
.:.Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18.:.
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK .:.Universität zu Köln / Cologne University - ✆ +49-221-478-5587.:. .:.:.:.Skype: shagedorn.:.:.:.
participants (2)
-
Mark Sapiro
-
Sebastian Hagedorn