[Mailman-Developers] internal mail archiver/htdig integration

Richard Barrett R.Barrett@ftel.co.uk
Fri, 22 Sep 2000 11:53:28 +0100


In commissioning Mailman I needed to provide a per mail list search 
facility which fully honored private list access constraints (i.e. if 
the use cannot read it then do not tell him it even exists in search 
results). In implementing this I have made some changes to the 
mailman-2.0beta5 source. What I have done is hardly rocket-science. 
That said it would be good for me if the work was integrated with the 
standard releases of Mailman so I do not have to keep applying my own 
unique patches to any subsequent releases. I don't know the protocol 
for doing this nor whether other developers think my work is worth 
incorporating into the mainline of Mailman's code. I've outlined the 
changes below and would appreciate some advice. btw: If there's a 
better way to achieve the same objectives then I'm more than happy to 
throw my changes away.

I've integrated Mailman's internal archiver with htdig. Use of htdig 
is conditional on a new Mailman configuration variable being set, the 
use of the internal archiver and a list being subject to archiving. 
If these conditions are met then the relevant list-specific htdig 
conf files and such are created and maintained automatically. If you 
don't want the integration then turning of a single Mailman config 
variable makes this integration completely transparent.

For qualifying lists the htsearch form is embedded in the list's 
archive index page, when that is generated by HyperArch.py, and hence 
is downstream of user authentication for accessing a private archive. 
Access to URL's returned by htsearch are all mediated by an 
additional wrapped cgi script called htdig.py which is similar to 
private.py. Similar in the sense that for private lists it requires 
the presence of a list authentication cookie or it rejects the 
access. For public lists it just responds with the html page 
requested. In the normal course of events the user's browser gets the 
authentication cookie on board while reaching the archive index page, 
to do a search, in the normal fashion. This approach means that 
changing a list from public to private or vice versa does not 
invalidate the htdig search files as access to archive html files in 
either type of list uses a common root in their URLs.

An additional directory called htdig is created under each 
.../mailman/archives/private/<listname> in which the htdig search 
files and configuration file for each list are stored.  An additional 
directory called .../mailman/archives/htdig holds symlinks to the 
list-specific htdig configuration files. .../mailman/archives/htdig 
is itself the target of a symlink in the directory in which htdig is 
configured to find its configuration files. [These symlinks are to 
allow htsearch to locate the list specific htdig conf files.] This 
directory arrangement meets that deleting a list in the normal 
fashion also disposes  of related search stuff without any further 
effort or thought.

The whole is rounded off with a cron initiated script which uses 
rundig to update list-specific search files on a regular basis iff 
each given list archive has had more data added to it since the 
script last ran.

Changes have been made to the following mailman-2.0beta5 source files:

Hyperarch.py

1. extra function to set up list-specific htdig creates list's htdig 
directory and generates list's htdig conf file in that directory. 
Logic added so that this function is called when archiving for the 
list is commenced.
2. added meta tags and and <!--htdig_noindex> tags in the html 
templates to improve quality of seach results and efficiency of 
htdig'ing
3. added htsearch form html template and logic to selectively include 
that when generating list index pages

src/Makefile

Added htdig to list of cgi scripts to get a wrapper for new htdig.py generated

Defaults.py

Added directives to provide control of htdig integration. Includes 
USE_HTDIG which enables/disables all the other changes if you want to 
use/not use htdig in this way.

New files are:

cgi script htdig.py mediates access to all htsearch results

cron activated script nightly_htdig to selectively regenerate htdig search file

Some tidy up of installer may yet be needed to finalise this work 
plus some minor installation documentation changes.
------------------------------------------------------------------
Richard Barrett, PostPoint 27,         e-mail:r.barrett@ftel.co.uk
Fujitsu Telecommunications Europe Ltd,      tel: (44) 121 717 6337
Solihull Parkway, Birmingham Business Park, B37 7YU, England
"Democracy is two wolves and a lamb voting on what to have for
lunch. Liberty is a well armed lamb contesting the vote."
Benjamin Franklin, 1759
------------------------------------------------------------------