
"JCL" == J C Lawrence <claw@kanga.nu> writes:
JCL> Configuration of what exactly happens to a message is done
JCL> by dropping scrpts/program in specially named directories (it
JCL> is expected that typically only SymLinks will be dropped
JCL> (which makes the web interface easy -- just creates and moves
JCL> symlinks about)).
At a high level, what you're describing is a generalization of MM2's message handler pipeline. In that respect, I'm in total agreement. It's a nice touch to have separate pipelines between each queue boundary, with return codes directing the machinery as to the future disposition of the message.
But I don't like the choice of separate scripts/programs as the basic components of this pipeline. Let me rotate that just a few degrees to the left, squint my eyes, and change the scripts to Python modules, and return codes to return values or exceptions. Then I'm sold, and I think you can do everything you want (including using separate scripts if you want), and are more efficient for the common situations.
First, we don't need to mess with symlinks to make processing order configurable. We simply change the order of entries in a sequence (read: Python list). It's a trivial matter to allow list admins to select the names of the components they want, the order, etc. and to keep this information on a per-list basis. Actually, the web interface I imagine doesn't give list admins configurability at that fine a grain. Instead, a site administrator can set up list "styles" or patterns, one of which includes canned filter sets; i.e. predefined component orderings created, managed, named, and made available by the site administrator.
Second, it's more efficient because I imagine Mailman 3.0 will be largely a long running server process, so modules need only be imported once as the system warms up. Even re-importing in a one-shot architecture will be more efficient than starting and stopping scripts all the time, because of the way Python modules cache their bytecodes (pyc files).
Third, you can still do separate scripts/programs if you want or need. Say there's something you can only do by writing a separate Java program to interface with your corporate backend Subject: header munger. You should be able to easily write a pipeline module that hides all that in the implementation. You can even design your own efficient backend IPC protocol to talk to whatever external resource you need to talk to. I contend that the overhead and complexity of forking off scripts, waiting for their exit codes, process management, etc. etc. just isn't necessary in the common case, where 5 or 50 lines of Python will do the job nicely.
Fourth, yes, maybe it's a little harder to write these components in Perl, bash, Icon or whatever. That doesn't bother me. I'm not going to make it impossible, and in fact, I think if that if that were to become widely necessary, a generic process-forking module could be written and distributed.
I don't think this is very far afield of what your describing, and it has performance and architectural benefits IMO. We still formalize the interface that pipeline modules must conform to, probably spelled like a Python class definition, with elaborations accomplished through subclassing.
Does this work for you? Is there something a script/program component model gives you that the class/module approach does not?
-Barry