[Mailman-Users] Indexing mail right after delivery
Cedric Jeanneret
cedric.jeanneret at camptocamp.com
Tue Mar 2 12:41:35 CET 2010
On Fri, 26 Feb 2010 10:15:13 -0800
Mark Sapiro <mark at msapiro.net> wrote:
> On 2/26/2010 4:20 AM, Cedric Jeanneret wrote:
> > On Thu, 25 Feb 2010 17:08:06 -0800 Mark Sapiro <mark at msapiro.net>
> > wrote:
> >
> >> Cedric Jeanneret wrote:
> >>>
> >>> I'm trying to create a xapian[1] indexer for our mailing list. As
> >>> mailman is written in Python and there are python bindings for
> >>> xapian, I guess I can maybe create a plugin for that. My first
> >>> question is : is there already such a thing ? I searched on the
> >>> net, but nothing appeared My second one : can we create a plugin
> >>> for mailman, if so, where should I go to have some doc ? seems
> >>> there's nothing in the wiki
> >>> (http://wiki.list.org/dosearchsite.action?searchQuery.queryString=plugin&searchQuery.spaceKey=conf_all)
> >>>
> >>>
> >>>
> Just to explain why I'd like to do that: we already have a xapian search
> engine in here, indexing a fileserver, request tracker queues and
> moinmoin wikis... so we'd like to aggregate all our stuff in one app for
> searching.
> >>
> >>
> >> This will be quite doable with Mailman 3 which is still in
> >> development.
> >>
> >> There are problems trying to do this in Mailman 2.1.x. There is a
> >> plugin capability of sorts in the form of custom handlers that can
> >> be added to the incoming message processing pipeline. See the FAQ
> >> at <http://wiki.list.org/x/l4A9>. However, archiving is
> >> asynchronous with incoming message processing, so it is not
> >> possible for a custom handler to know the URL that will ultimately
> >> retrieve the message from the archive.
> >>
> >> A different approach which might be workable is to use the
> >> PUBLIC_EXTERNAL_ARCHIVER and PRIVATE_EXTERNAL_ARCHIVER hooks. If
> >> you set
> >>
> >> PUBLIC_EXTERNAL_ARCHIVER = '/path/to/script.py'
> >> PRIVATE_EXTERNAL_ARCHIVER = '/path/to/script.py'
> >>
> >> in mm_cfg.py, then that script will be invoked do do the archiving.
> >> The script in turn could invoke the standard pipermail archiving
> >> process and then invoke xapian to index the archived message.
> >>
> >
> >
> > Hello again,
> >
> > Just one question : what do mlist, msg, msgdata stand for ? As I read
> > I've to create my module and define a "process(mlist, msg, msgdata)
> > inside it, I'd like to know what are those objects. I discovered that
> > mlist stands for a Mailman.MailList.MailList('list-name'), but for
> > the others, it's a bit hard to find...
>
>
> Only custom handlers need to define process(mlist, msg, msgdata). That
> is the entry point to the handler and three objects are passed
>
> mlist is the Mailman.MailList.MailList() instance for the current list
>
> msg is a Mailman.Message.Message() (subclass of email.Message.Message)
> instance for the current message
>
> msgdata is a dictionary of the message metadata accumulated so far.
>
> The important thing is these are passed in as arguments to the handler
> process() function.
>
> In your case, you are defining a module which is going to be invoked
> like the following.
>
> Suppose that
>
> PUBLIC_EXTERNAL_ARCHIVER = '/path/to/myarch.py %(hostname)s %listname)s'
>
> It will be invoked in a pipe similar to
>
> cat raw_message | /path/to/myarch.py HOST LIST
>
> i.e. the command string with %(hostname)s and %listname)s replaced by
> the actual host name and list name of the list will be invoked and the
> message piped to it.
>
> So, it could begin something like:
>
> #!python
> import sys
> sys.path.insert(0, 'path/to/mailman/bin')
> # The above line can be skipped if myarch.py is in Mailman's
> # bin directory.
> import paths
>
> import email
> from Mailman import MailList
> from Mailman import Message
>
> msg = email.message_from_file(sys.stdin, Message.Message)
> mlist = MailList.MailList(sys.argv[1], lock=True)
>
>
> At this point, you have a list object (locked) and a message object. You
> might think you could just do
>
> mlist.ArchiveMail(msg)
>
> to archive the mail to the listname.mbox file and the pipermail archive,
> but that wouldn't quite work because that method would re-invoke the
> external archiver. Also, you don't need to worry about the listname.mbox
> file because the ArchiveMail() method already did that before invoking
> the external archiver, so what you would need is
>
> from Mailman.Archiver import HyperArch
> from cStringIO import StringIO
> f = StringIO(str(msg))
> h = HyperArch.HyperArchive(mlist)
> h.processUnixMailbox(f)
> h.close()
> f.close()
>
> Which is what the ArchiveMail() method would do. Now you still have the
> mlist and msg objects, and you need to save and unlock the list at some
> point
>
> mlist.Save()
> mlist.Unlock()
>
> and the message is now in the pipermail archive and can be indexed.
>
Hello again,
I'm having some troubles with my code. According to what Mark said, I've done this :
#!/usr/bin/env python
import sys
sys.path.insert(0,'/usr/lib/mailman')
import syslog
syslog.syslog('begin script')
import email
from Mailman import MailList
from Mailman import Message
## archive part
from Mailman.Archiver import HyperArch
from cStringIO import StringIO
maillist = sys.argv[2]
hostname = sys.argv[1]
msg = email.message_from_file(sys.stdin, Message.Message)
syslog.syslog(maillist)
mlist = MailList.MailList(maillist, lock=True)
syslog.syslog('processing archiver')
## let archive it
f = StringIO(str(msg))
h = HyperArch.HyperArchive(mlist)
h.processUnixMailbox(f)
h.close()
f.close()
mlist.Save()
mlist.Unlock()
mlist.ArchiveMail(msg)
syslog.syslog('processing indexer')
### coming soon
syslog.syslog('exiting - all ok')
sys.exit(0)
"syslog" is for debug purpose only.
And if I send an email on my ML, I have this kind of error:
Mar 02 12:38:33 2010 (28380) toto.lock lifetime has expired, breaking
Mar 02 12:38:33 2010 (28380) File "/var/lib/mailman/scripts/driver", line 250, in <module>
Mar 02 12:38:33 2010 (28380) run_main()
Mar 02 12:38:33 2010 (28380) File "/var/lib/mailman/scripts/driver", line 110, in run_main
Mar 02 12:38:33 2010 (28380) main()
Mar 02 12:38:33 2010 (28380) File "/usr/lib/mailman/Mailman/Cgi/admin.py", line 167, in main
Mar 02 12:38:33 2010 (28380) mlist.Lock()
Mar 02 12:38:33 2010 (28380) File "/usr/lib/mailman/Mailman/MailList.py", line 161, in Lock
Mar 02 12:38:33 2010 (28380) self.__lock.lock(timeout)
Mar 02 12:38:33 2010 (28380) File "/usr/lib/mailman/Mailman/LockFile.py", line 306, in lock
Mar 02 12:38:33 2010 (28380) important=True)
Mar 02 12:38:33 2010 (28380) File "/usr/lib/mailman/Mailman/LockFile.py", line 416, in __writelog
Mar 02 12:38:33 2010 (28380) traceback.print_stack(file=logf)
This block is spamming my /var/log/mailman/locks
It seems I have a problem with the lockfile...
Any idea ?
Thank you!
--
Cédric Jeanneret | System Administrator
021 619 10 32 | Camptocamp SA
cedric.jeanneret at camptocamp.com | PSE-A / EPFL
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-users/attachments/20100302/fdb10c74/attachment.pgp>
More information about the Mailman-Users
mailing list