In commissioning Mailman I needed to provide a per mail list search facility which fully honored private list access constraints (i.e. if the use cannot read it then do not tell him it even exists in search results). In implementing this I have made some changes to the mailman-2.0beta5 source. What I have done is hardly rocket-science. That said it would be good for me if the work was integrated with the standard releases of Mailman so I do not have to keep applying my own unique patches to any subsequent releases. I don't know the protocol for doing this nor whether other developers think my work is worth incorporating into the mainline of Mailman's code. I've outlined the changes below and would appreciate some advice. btw: If there's a better way to achieve the same objectives then I'm more than happy to throw my changes away.
I've integrated Mailman's internal archiver with htdig. Use of htdig is conditional on a new Mailman configuration variable being set, the use of the internal archiver and a list being subject to archiving. If these conditions are met then the relevant list-specific htdig conf files and such are created and maintained automatically. If you don't want the integration then turning of a single Mailman config variable makes this integration completely transparent.
For qualifying lists the htsearch form is embedded in the list's archive index page, when that is generated by HyperArch.py, and hence is downstream of user authentication for accessing a private archive. Access to URL's returned by htsearch are all mediated by an additional wrapped cgi script called htdig.py which is similar to private.py. Similar in the sense that for private lists it requires the presence of a list authentication cookie or it rejects the access. For public lists it just responds with the html page requested. In the normal course of events the user's browser gets the authentication cookie on board while reaching the archive index page, to do a search, in the normal fashion. This approach means that changing a list from public to private or vice versa does not invalidate the htdig search files as access to archive html files in either type of list uses a common root in their URLs.
An additional directory called htdig is created under each .../mailman/archives/private/<listname> in which the htdig search files and configuration file for each list are stored. An additional directory called .../mailman/archives/htdig holds symlinks to the list-specific htdig configuration files. .../mailman/archives/htdig is itself the target of a symlink in the directory in which htdig is configured to find its configuration files. [These symlinks are to allow htsearch to locate the list specific htdig conf files.] This directory arrangement meets that deleting a list in the normal fashion also disposes of related search stuff without any further effort or thought.
The whole is rounded off with a cron initiated script which uses rundig to update list-specific search files on a regular basis iff each given list archive has had more data added to it since the script last ran.
Changes have been made to the following mailman-2.0beta5 source files:
Hyperarch.py
- extra function to set up list-specific htdig creates list's htdig directory and generates list's htdig conf file in that directory. Logic added so that this function is called when archiving for the list is commenced.
- added meta tags and and <!--htdig_noindex> tags in the html templates to improve quality of seach results and efficiency of htdig'ing
- added htsearch form html template and logic to selectively include that when generating list index pages
src/Makefile
Added htdig to list of cgi scripts to get a wrapper for new htdig.py generated
Defaults.py
Added directives to provide control of htdig integration. Includes USE_HTDIG which enables/disables all the other changes if you want to use/not use htdig in this way.
New files are:
cgi script htdig.py mediates access to all htsearch results
cron activated script nightly_htdig to selectively regenerate htdig search file