[Mailman-Users] Integration with external search engine

Lukáš Vlček lukas.vlcek at gmail.com
Fri Dec 17 13:55:36 CET 2010


Short version - I have two questions:

1) How to setup external archiver so that the email content gets indexed by
external search engine
2) How to (re)index existing content from mail list by external search

Longer version:

I am looking at a best practice way how to integrate mailman with external
search engine. I found the following Wiki page [1] which contains a link to
Ext_Arch.py template which is brainchild of Mark Sapiro and Cedric Jeanneret
[2]. Cerdic was after indexing emails using Xapian and his implementation of
the Ext_Arch.py can be found here [3]. This all looks very promising but I
have a few questions/concerns:

To me it seems that the PUBLIC_EXTERNAL_ARCHIVER and
PRIVATE_EXTERNAL_ARCHIVER commands (which are both set in mm_cfg.py) are
executed only when a new message arrives, that means it is not executed when
bin/arch is executed. This means that if there has been running some mail
list on mailman for a few years now and now I would like to allow searching
its content via new external search engine (like Xapian) it is simply no
enough to add external archiver and restart mailman because this would index
only newly added messages. Am I right?

How can I then have reindexed old content from that mail list into Xapian as
well? bin/arch <maillist> does not do that as it does not execute external
archivers. Moreover, running bin/arch can change URLs of individual public
emails (re-numbering) and that is probably unacceptable. So is there any way
how to iterate over existing emails, parse them and get an existing URL
value for them? (Such information could be then used to re-index old content
into external search server without need to run bin/arch).

Lukas Vlcek

[2] http://www.mail-archive.com/mailman-users@python.org/msg56679.html

More information about the Mailman-Users mailing list