On Fri, Feb 26, 2010 at 7:15 PM, Mark Sapiro <mark@msapiro.net> wrote:
On 2/26/2010 4:20 AM, Cedric Jeanneret wrote:
On Thu, 25 Feb 2010 17:08:06 -0800 Mark Sapiro <mark@msapiro.net> wrote:
Cedric Jeanneret wrote:
I'm trying to create a xapian[1] indexer for our mailing list. As mailman is written in Python and there are python bindings for xapian, I guess I can maybe create a plugin for that. My first question is : is there already such a thing ? I searched on the net, but nothing appeared My second one : can we create a plugin for mailman, if so, where should I go to have some doc ? seems there's nothing in the wiki (http://wiki.list.org/dosearchsite.action?searchQuery.queryString=plugin&searchQuery.spaceKey=conf_all)
Just to explain why I'd like to do that: we already have a xapian search engine in here, indexing a fileserver, request tracker queues and moinmoin wikis... so we'd like to aggregate all our stuff in one app for searching.
This will be quite doable with Mailman 3 which is still in development.
There are problems trying to do this in Mailman 2.1.x. There is a plugin capability of sorts in the form of custom handlers that can be added to the incoming message processing pipeline. See the FAQ at <http://wiki.list.org/x/l4A9>. However, archiving is asynchronous with incoming message processing, so it is not possible for a custom handler to know the URL that will ultimately retrieve the message from the archive.
A different approach which might be workable is to use the PUBLIC_EXTERNAL_ARCHIVER and PRIVATE_EXTERNAL_ARCHIVER hooks. If you set
PUBLIC_EXTERNAL_ARCHIVER = '/path/to/script.py' PRIVATE_EXTERNAL_ARCHIVER = '/path/to/script.py'
in mm_cfg.py, then that script will be invoked do do the archiving. The script in turn could invoke the standard pipermail archiving process and then invoke xapian to index the archived message.
Hello again,
Just one question : what do mlist, msg, msgdata stand for ? As I read I've to create my module and define a "process(mlist, msg, msgdata) inside it, I'd like to know what are those objects. I discovered that mlist stands for a Mailman.MailList.MailList('list-name'), but for the others, it's a bit hard to find...
Only custom handlers need to define process(mlist, msg, msgdata). That is the entry point to the handler and three objects are passed
mlist is the Mailman.MailList.MailList() instance for the current list
msg is a Mailman.Message.Message() (subclass of email.Message.Message) instance for the current message
msgdata is a dictionary of the message metadata accumulated so far.
The important thing is these are passed in as arguments to the handler process() function.
In your case, you are defining a module which is going to be invoked like the following.
Suppose that
PUBLIC_EXTERNAL_ARCHIVER = '/path/to/myarch.py %(hostname)s %listname)s'
It will be invoked in a pipe similar to
cat raw_message | /path/to/myarch.py HOST LIST
i.e. the command string with %(hostname)s and %listname)s replaced by the actual host name and list name of the list will be invoked and the message piped to it.
So, it could begin something like:
#!python import sys sys.path.insert(0, 'path/to/mailman/bin') # The above line can be skipped if myarch.py is in Mailman's # bin directory. import paths
import email from Mailman import MailList from Mailman import Message
msg = email.message_from_file(sys.stdin, Message.Message) mlist = MailList.MailList(sys.argv[1], lock=True)
At this point, you have a list object (locked) and a message object. You might think you could just do
mlist.ArchiveMail(msg)
to archive the mail to the listname.mbox file and the pipermail archive, but that wouldn't quite work because that method would re-invoke the external archiver. Also, you don't need to worry about the listname.mbox file because the ArchiveMail() method already did that before invoking the external archiver, so what you would need is
from Mailman.Archiver import HyperArch from cStringIO import StringIO f = StringIO(str(msg)) h = HyperArch.HyperArchive(mlist) h.processUnixMailbox(f) h.close() f.close()
Which is what the ArchiveMail() method would do. Now you still have the mlist and msg objects, and you need to save and unlock the list at some point
mlist.Save() mlist.Unlock()
and the message is now in the pipermail archive and can be indexed.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
wow, thanks a lot, with all this I'll be able to do what I want!
I'll post all my stuff as soon as I've done it, hopefully next week :).
Thanks again.
Best regards,
C.