I like the idea of process queues, but I don't want to take the federation-of-processes architecture too far. Yes, we want a component architecture, but where I see the process boundaries is at the message queue level.
For the delivery of messages, I see Mailman's primary job as moderation-and-munge. Message come into the system from the MTA, nntp-scraper, web-board poster, or are internally crafted. All these things end up in the incoming queue. They need to be approved, rewritten, moderated, and eventually sent on to various outbound queues: nntp-poster, smtp-delivery, archiver, etc. Some of these are completely independent of the Mailman databases. E.g. it is a mistake that SMTPDirect is in the message pipeline in 2.0 because once a message hits this component, it's future disposition is (largely) independent of the rest of the system.
So in my view, when Mailman decides that a message can be delivered to a membership list, it's dropped fully formed in an outbound queue. The file formats are the interface b/w Mailman and the queue runners and should be platform (i.e. Python) independent. That way, I can ship a simple queue runner that takes messages from the outbound queue and hands them off to the smtpd, but /you/ could drop in a different runner process that uses GNQS to distribute load across an infinitely expandable smtpd server farm.
[Side note. Here's another reason why I'm keen on ZODB/ZEO as the underlying persistency mechanism for internal Mailman data: I believe we can parallelize the moderate-and-munge part of message processing. Because the ZEO protocols serialize writes at commit time, you could have multiple moderate-and-munge processes running on a server farm and guarantee db consistency across them. What I don't know is how ZEO would perform given a write-intensive environment (and maybe Mailman isn't as write intensive as I think it is). But even if it sucks, it simply means that the moderate-and-munge part won't be efficiently parallizable until that's fixed.]
"JCL" == J C Lawrence <claw@kanga.nu> writes: "CVR" == Chuq Von Rospach <chuqui@plaidworks.com> writes:
JCL> There are five basic transition points for a message passing
JCL> thru a mailing list server:
| 1) Receipt of message by local MTA
| 1a) passthrough of message via a security wrapper from MTA to
| list server... (I think it's important we remember that, because
| we can't lose it, and it involves a layer of passthrough and a
| process spawning, so it's somewhat heavyweight -- but
| indispensable)
No problems here, because I see these as being outside the bounds of the MLM. The MLM has an incoming queue and it expects messages in a particular format (very likely just RFC822 text files). These arrive here via whatever tortuous path is necessary: MTA->security wrapper, nntpd->news scraper, web board cgi poster, etc.
| 2) Receipt by list server
| 3) Approval/editing/moderation
What I've been calling moderate-and-munge.
| 4) Processing of message and emission of any resultant message(s)
Here's where the output queues and process boundaries come it. Once they're in the outbound queues, Mailman's out of the loop.
| 5) Delivery of message to MTA for final delivery.
Again, that's the responsibility of the mta-qrunner, be it a simple minded Python process like today's qrunner, or batch processing system like you've been investigating.
These processes are not completely independent of Mailman though, e.g. for handling hard errors at smtp transaction time or URL generation for summary digests. Some of these can be handled by re-injection into the message queues (i.e. generate a bounce message and stick it in the bounce queue), but some may need an rpc interface.
| 6) delivery of message to non-MTA recipients (the archiver, the
| logging thing, the digester, the bounce processor....)
Each of these should be separate queues with defined process interfaces, but again there may be synchronous information communicated back to Mailman. The archiver discussions we've had come to mind here.
CVR> and besides, they are basically independent, asynchronous
CVR> processes that don't need to be managed by any of the core
CVR> logic, other than handing messages into their queue and
CVR> making sure they stay running. same with, IMHO, storing
CVR> messages for archives, storing messages for digests, updating
CVR> archives, processing digests (but the processed digest is fed
CVR> back into the core logic for delivery), and whatever else we
CVR> decide it needs to do that isn't part of the core,
CVR> time-sensitive code base. (in fact, there's no reason why you
CVR> couldn't have multiple flavors of these things, feeding
CVR> archives into an mbox, another archiver into mhonarc or
CVR> pipermail, something that updates the search engine indexes,
CVR> and text adn mime digesters... by turning them into their own
CVR> logic streams with their own queues, you effectivley have
CVR> just made them all plug-in swappable, because you're writing
CVR> to a queue, and not worrying about what happens once its
CVR> there. you merely need to make sure it goes in the right
CVR> queue, in the approved format.
I agree!
-Barry