[Mailman-Developers] internal mail archiver/htdig integration

22 Sep 2000

      In commissioning Mailman I needed to provide a per mail list search
facility which fully honored private list access constraints (i.e. if
the use cannot read it then do not tell him it even exists in search
results). In implementing this I have made some changes to the
mailman-2.0beta5 source. What I have done is hardly rocket-science.
That said it would be good for me if the work was integrated with the
standard releases of Mailman so I do not have to keep applying my own
unique patches to any subsequent releases. I don't know the protocol
for doing this nor whether other developers think my work is worth
incorporating into the mainline of Mailman's code. I've outlined the
changes below and would appreciate some advice. btw: If there's a
better way to achieve the same objectives then I'm more than happy to
throw my changes away.
I've integrated Mailman's internal archiver with htdig. Use of htdig
is conditional on a new Mailman configuration variable being set, the
use of the internal archiver and a list being subject to archiving.
If these conditions are met then the relevant list-specific htdig
conf files and such are created and maintained automatically. If you
don't want the integration then turning of a single Mailman config
variable makes this integration completely transparent.
For qualifying lists the htsearch form is embedded in the list's
archive index page, when that is generated by HyperArch.py, and hence
is downstream of user authentication for accessing a private archive.
Access to URL's returned by htsearch are all mediated by an
additional wrapped cgi script called htdig.py which is similar to
private.py. Similar in the sense that for private lists it requires
the presence of a list authentication cookie or it rejects the
access. For public lists it just responds with the html page
requested. In the normal course of events the user's browser gets the
authentication cookie on board while reaching the archive index page,
to do a search, in the normal fashion. This approach means that
changing a list from public to private or vice versa does not
invalidate the htdig search files as access to archive html files in
either type of list uses a common root in their URLs.
An additional directory called htdig is created under each
.../mailman/archives/private/<listname> in which the htdig search
files and configuration file for each list are stored.  An additional
directory called .../mailman/archives/htdig holds symlinks to the
list-specific htdig configuration files. .../mailman/archives/htdig
is itself the target of a symlink in the directory in which htdig is
configured to find its configuration files. [These symlinks are to
allow htsearch to locate the list specific htdig conf files.] This
directory arrangement meets that deleting a list in the normal
fashion also disposes  of related search stuff without any further
effort or thought.
The whole is rounded off with a cron initiated script which uses
rundig to update list-specific search files on a regular basis iff
each given list archive has had more data added to it since the
script last ran.
Changes have been made to the following mailman-2.0beta5 source files:
Hyperarch.py

extra function to set up list-specific htdig creates list's htdig
directory and generates list's htdig conf file in that directory.
Logic added so that this function is called when archiving for the
list is commenced.
added meta tags and and <!--htdig_noindex> tags in the html
templates to improve quality of seach results and efficiency of
htdig'ing
added htsearch form html template and logic to selectively include
that when generating list index pages

src/Makefile
Added htdig to list of cgi scripts to get a wrapper for new htdig.py generated
Defaults.py
Added directives to provide control of htdig integration. Includes
USE_HTDIG which enables/disables all the other changes if you want to
use/not use htdig in this way.
New files are:
cgi script htdig.py mediates access to all htsearch results
cron activated script nightly_htdig to selectively regenerate htdig search file
Some tidy up of installer may yet be needed to finalise this work
plus some minor installation documentation changes.
Richard Barrett, PostPoint 27,         e-mail:r.barrett@ftel.co.uk
Fujitsu Telecommunications Europe Ltd,      tel: (44) 121 717 6337
Solihull Parkway, Birmingham Business Park, B37 7YU, England
"Democracy is two wolves and a lamb voting on what to have for
lunch. Liberty is a well armed lamb contesting the vote."
Benjamin Franklin, 1759

[Mailman-Developers] internal mail archiver/htdig integration

Richard Barrett

Some tidy up of installer may yet be needed to finalise this work plus some minor installation documentation changes.