[Mailman-Users] searchable archives?

Richard Barrett R.Barrett at ftel.co.uk
Mon Oct 15 18:37:49 CEST 2001


At 15:45 11/10/2001 -0400, Jimmy McDonald wrote:

>I have a couple more questions before I start trying this out.
>
>The mailman server is an internal server and cannot
>be accessed from the outside world so I plan to
>push the archives outside the firewall every night.

These comments refer to the patches listed below and as a general comment 
your proposed approach will not fit that well with using the patches.

If you intend to copy the html version of your mail archives in this way 
there are a number of issues to consider:

1. You may want to rewrite the html pages as you copy them because some of 
them (e.g. index.html, subject.html, thread.html and date.html) have 
installation specific information (extracted from 
$prefix/Mailman/Defaults.py e.g. DEFAULT_URL) baked into them. These URLs 
will presumably be invalid once the pages have been copied.

2. Sort out your approach to controlling access to these archives. For 
example, access to private Mailman archives is normally mediated by the 
$prefix/Mailman/Cgi/private.py script (i.e this script is called to return 
archived e-mail from private archives) which in turn refers to the Mailman 
'database' of information for the list containing subscriber ids and 
passwords. These mechanisms will presumably not be available on the machine 
outside/on the firewall which will be serving the copied archive data. The 
patches strive to retain the standard Mailman privacy protection. The URLs 
returned by htdig searches will all make the browser go via the 
$prefix/Mailman/Cgi/htdig.py script which imposes the same access 
restrictions as private.py (which happens to be none in the case of public 
archives). Again, this script presumably won't be available on your machine 
outside/on the firewall.


>If I install mailman with the htdig patch on my web
>server (and not run sendmail) will this work or does the

I can't see the relevance of the MTA to the issue.

The patches below are designed to seamlessly integrate search with the 
standard Mailman means of accessing either private or public html mail 
archives produced by Mailman built-in archiver, pipermail.

You could use htdig or any other search engine to provide search facilities 
for html archives you have transferred outside your firewall but then the 
situation is sufficiently different that the patches have limited relevance.

>htdig information get processed as a message is archived?
>If so, can the htdig database be pushed as well?

The patches use htdig in a fairly conventional way and include a cron 
activated script which runs htdig daily to updates its indexes of the mail 
archives. One of the specialisations is that the patches automatically 
create an htdig config file per mail list. This leads to per list htdig 
indexes and per list searches: which is how list privacy is maintained. If 
you are going to move public list archives outside you may not want this 
fine grain control over search and access control.

If you are going to proceed as you suggest and transfer archives from 
inside to outside server each night then you might just as well do your own 
set up of htdig/your-search-engine-of-choice and build the indexes on the 
external server each night after doing the data transfer.

Hope the above helps.

>Thanks,
>Jimmy
>
>
>>>I have a mailman list that keeps archives and
>>>someone wants me to make those archives
>>>searchable via a web browser.
>>>
>>>Is there a way mailman can do this?
>>>Any other thoughts?
>>>
>>>Thanks,
>>>Jimmy
>>>
>>>
>>>------------------------------------------------------
>>>Mailman-Users maillist  -  Mailman-Users at python.org
>>>http://mail.python.org/mailman/listinfo/mailman-users
>>
>>The following patches integrate the htdig (http://www.htdig.org/) search 
>>engine with Mailman.
>>
>>http://sourceforge.net/tracker/index.php?func=detail&aid=444879&group_id=103&atid=300103
>>
>>http://sourceforge.net/tracker/index.php?func=detail&aid=444884&group_id=103&atid=300103
>>
>>The main features of the patch, from its sourceforge summary are:
>>
>>1. per list search facility with a search form on the
>>list's TOC page.
>>
>>2. maintenance of privacy of private archives which
>>requires the user to establish their credentials via
>>the normal private archive access before any access
>>via htdig is allowed.
>>
>>3. a common base URL for both public and private
>>archive access via htsearch results so that htdig
>>indices are unaffected by changing an archive from
>>private to public and vice versa. All access to
>>archives via htdig is controlled by a new wrapped cgi-
>>bin script called htdig.py.
>>
>>4. a new cron activated script and extra crontab entry
>>which runs htdig regularly to maintain the per list
>>search indices.
>>
>>5. automatic creation, deletion and maintenance of
>>htdig configuration files and such. Beyond installing
>>htdig and telling Mailman where it is via mm_cfg you
>>do not have to do any other setup. Well not quite you
>>do have to set up a single per installation symlink to
>>allow htdig to find the automatically generated per
>>list htdig configuration files.
>
>
>------------------------------------------------------
>Mailman-Users maillist  -  Mailman-Users at python.org
>http://mail.python.org/mailman/listinfo/mailman-users





More information about the Mailman-Users mailing list