decoupling the web interface from the exploder.
All right, I haven't done any Mailman hacking for a while, but I wanted to work on adding this feature.
Let me explain why.
Right now, it's essentially impossible to have more than one machine doing Mailman processing. (Yes, there are ways to hack around it, but they get ugly quickly.)
It's also impossible to have the web interface on any machine other than the machine where list explosion happens. This is suboptimal, because you have the potential of having more than one web server, or more than one mail box.
Let me give an example, from the real world, aka my setup.
|---------|
/---| MR/web |---\ |
|----------| / |---------| \ | |----------| | internet |< >--|-----|int. mail | |----------| \ |---------| / | |----------| \---| MR/web |---/ | |---------| ^ Firewall
MR == Mail Relay. All mail/web traffic is load balanced.
Now, I can't simply relay all traffic for mailing lists inside the firewall, because then I would no longer have access to the web interface. Therefore I have to special case all my lists, to have them exploded on the MR machine, then relayed to the internal mail machine for delivery.
That part works fine for one machine, it doesn't work for 1+N. Instead you have to do is have each of the N MR machines forward all mailing-list traffic to one machine, and put redirects in your web server to again point to that one machine.
That's kinda ugly, and defeats the purpose of having multiple identical relays.
What I'd really like is a way to have the web interface be a client which could get/set any mailing list configuration through some sort of backend. Then it would be trivial to just relay all lists into the internal machine(s) for delivery.
So, if I understand it correctly there are three things that need to be abstracted, and backends written (ideally, for me, to use a database):
- membership lists
- list config
- templates
- archives
Now (1) seems to have been done already with MemberAdaptor, and (2) shouldn't be that hard to do either, but will require changing a number of the assumptions made in MailList.py and probably elsewhere.
- is annoying, but shouldn't be too hard to handle. Again, there just needs to be a defined interface.
The most challenging one is (4), which is easy enough to do for the case of one delivery machine, but much harder for the 1+N case. It also needs to be abstracted.
So, I'll probably be able to devote time over the next couple months to writing this, but I'm interested in how people feel such a beast should look, especially (4).
Thoughts?
Darrell
[Barry, question for you further down]
On Wed, Jan 23, 2002 at 03:22:43PM -0800, Darrell Fuhriman wrote:
Right now, it's essentially impossible to have more than one machine doing Mailman processing. (Yes, there are ways to hack around it, but they get ugly quickly.)
I have a mail server which automatically rewrites mailman lists envelope tos to go to the mailman only mail server.
It's also impossible to have the web interface on any machine other than the machine where list explosion happens. This is
Yes and no. Back in the mm2.0 days, one of my requirements for upgrading mailman on sourceforge.net was to have redundancy and load balancing.
You _can_ export ~mailman over NFS. The problem was that with linux 2.2 back then, under very high load and lock contention (I sent 1000 messages to the same list on the two different mail servers to force them to fight over ~mailman/lists/listname/config.db). I was able to find a race condition in NFS rename/unlinks which caused 3 messages out of the 2000 to bounce. (I don't know if it's under NFS only, but after getting config.db corruption on 3 messages, when mailman renames the ocnfig.db.last to config.db, there was a very small time window when there was no config.db, and my exim with auto list detection failed to stat the config.db, and stated that the list didn't exist).
For more details, see my this message in the archives: From: Marc MERLIN <marc_news@valinux.com> To: mailman-developers@python.org Subject: Doing load balancing with mailman Message-ID: <20001117130521.V9808@marc.merlins.org> Date: Fri, 17 Nov 2000 13:05:21 -0800
and the following thread:
From: Marc MERLIN <marc_news@valinux.com> To: mailman-developers@python.org Subject: about qrunner and locking Message-ID: <20001207162234.D25463@marc.merlins.org> Date: Thu, 7 Dec 2000 16:22:34 -0800
Barry: With the new qrunner infrastructure, does qrunner still need to lock the lists during delivery? If qrunner doesn't modify config.db anymore, could it open config.db read only?
The reason I ask is that, while the current sourceforge.net list server is still doing ok with 16,000+ lists, 600,000 Emails a day or so, and full SMTP callbacks on each incoming message, I'm still getting pressure to load balance the machine, especially for the high availability part :-)
If I can have qrunner not lock ocnfig.db, the only lock contention I'll have is when lists are modified through the web, and even if I share ~mailman over NFS, I'm confident that we won't hit some NFS race condition because the same list is being modified by two different admins at the same exact nanosecond.
So, I'll probably be able to devote time over the next couple months to writing this, but I'm interested in how people feel such a beast should look, especially (4).
I think the NFS approach is the simplest by far :-) It even works today if you don't deliver thousands of messages in a few seconds to the same list :-) (actually with linux 2.4 or some other OS, and mailman 2.1, the bug may not be triggered anymore)
Marc
Microsoft is to operating systems & security .... .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | Finger marc_f@merlins.org for PGP key
"MM" == Marc MERLIN <marc_news@vasoftware.com> writes:
MM> You _can_ export ~mailman over NFS. The problem was that with
MM> linux 2.2 back then, under very high load and lock contention
MM> (I sent 1000 messages to the same list on the two different
MM> mail servers to force them to fight over
MM> ~mailman/lists/listname/config.db). I was able to find a race
MM> condition in NFS rename/unlinks which caused 3 messages out of
MM> the 2000 to bounce. (I don't know if it's under NFS only, but
MM> after getting config.db corruption on 3 messages, when mailman
MM> renames the ocnfig.db.last to config.db, there was a very
MM> small time window when there was no config.db, and my exim
MM> with auto list detection failed to stat the config.db, and
MM> stated that the list didn't exist).
I believe that the algorithm that LockFile uses should be safe across NFS, modulo system bugs of course. I haven't run any stress tests against it in a looong time, but I once did, and don't ever remember seeing the bug you describe. I also don't remember the kernel rev, but it was a 2.2.something I'm sure. Maybe I was just (un)luckier than you.
MM> Barry: With the new qrunner infrastructure, does qrunner still
MM> need to lock the lists during delivery? If qrunner doesn't
MM> modify config.db anymore, could it open config.db read only?
Remember that now, we have usually 7 queues, and each one has its own runner process. One of the advantages of this is that we can really isolate lock acquisition to a finer granularity. In fact, OutgoingRunner -- which processes qfiles/out files, and thus is the process that actually calls SMTPDirect -- does not lock the lists for the normal delivery processing. It simply shovels messages from the queue to smtpd and doesn't need to update any list information, as that's all done before the message gets to the outgoing queue.
There's one exception (of course ;). If your smtpd ever returns synchronous errors, then Mailman has to lock the list in order to register bounces. However, Mailman only does this periodically, and this is controllable by the variable DEAL_WITH_PERMFAILURES_EVERY in Mailman/Queue/OutgoingRunner.py (it's not a mm_cfg.py variable).
By default this is set to 1, but you could crank it up so that the culling of the known bounces is done less frequently. OTOH, if your MTA is set up to never do recipient tests/deliveries synchonously (and you're not delivering to local users), you should never have such delivery failures to deal with, thus you'd never need to lock the list.
-Barry
On Tue, Jan 29, 2002 at 02:40:56PM -0500, Barry A. Warsaw wrote:
MM> Barry: With the new qrunner infrastructure, does qrunner still MM> need to lock the lists during delivery? If qrunner doesn't MM> modify config.db anymore, could it open config.db read only?
Remember that now, we have usually 7 queues, and each one has its own
Yep.
runner process. One of the advantages of this is that we can really isolate lock acquisition to a finer granularity. In fact, OutgoingRunner -- which processes qfiles/out files, and thus is the process that actually calls SMTPDirect -- does not lock the lists for the normal delivery processing. It simply shovels messages from the queue to smtpd and doesn't need to update any list information, as that's all done before the message gets to the outgoing queue.
Awesome, just what I was hoping for.
There's one exception (of course ;). If your smtpd ever returns synchronous errors, then Mailman has to lock the list in order to register bounces. However, Mailman only does this periodically, and this is controllable by the variable DEAL_WITH_PERMFAILURES_EVERY in Mailman/Queue/OutgoingRunner.py (it's not a mm_cfg.py variable).
That's not a real problem. Even if there is a race condition somewhere, this case should be sufficiently rare for the race not to happen.
By default this is set to 1, but you could crank it up so that the culling of the known bounces is done less frequently. OTOH, if your
Good to know.
MTA is set up to never do recipient tests/deliveries synchonously (and you're not delivering to local users), you should never have such delivery failures to deal with, thus you'd never need to lock the list.
Thanks a bunch. This should definitely allow for load sharing and some failover capability (the second server periodically rsyncs the NFS exported mailman tree in case the first list server, which exports ~mailman minus qfiles which is a symlink to local disk, dies. So, if this happens, you can move a symlink and continue with one server and an rsynced copy of the whole tree that is hopefully not too old)
Marc
Microsoft is to operating systems & security .... .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | Finger marc_f@merlins.org for PGP key
participants (3)
-
barry@zope.com
-
Darrell Fuhriman
-
Marc MERLIN