[Mailman-Users] Mailman & Htdig integration (with external archiver)

Tue Jan 15 17:09:09 CET 2002

At 14:33 15/01/2002 +0100, Sasa Janiska wrote:
>On Today, -0000, Richard Barrett wrote:
>
>Hi Richard!
>Thank you very much for your reply.
>
> > This is a straight htdig configuration issue. At the minimum you will have
> > to add start_url directives to htdig's conf file for each of the list
> > archives or ensure that links from one of the start_url directives in
> > htdig's conf file eventually lead to each of the list archives. You will
> > also have to have some sort of cron job to rebuild htdig's search indices
> > regularly (probably daily) to include new archived material.
>
>That's easy.
>
> > The following patches can be applied to the mailman 2.0.8 (and earlier
> > vesions of 2.0.x) to integrate htdig with Mailman and provide search of
> > archives generated by the internal (pipermail) archiver.
>
>Do you have soemthing ready for V2.1?

I have already posted on sourceforge versions of the patch for MM 2.1a3 and 
MM2.1cvs. The latter is for the MM cvs at the date and time noted in the 
posting but this may need updating depending on what change in the CVS 
since my posting. It is my intention to publish a version of the patch for 
the beta and final versions of MM 2.1 as soon as I can after they are 
available. Just check sourceforge for Mailman patches 444879 and 444884 
read the notes I post with each patch file.

> > The patches are not of direct relevance if you have opted to use an
> > external archiver.
>
>If pipermail can do the job, it isn't necessary. I am thinking about
>external archiver seeing that pipermail is no longer maintained ..

In the context of Mailman I think it can be said that pipermail is still 
being maintained. MM contains its own copy of pipermail code in python and 
if you search the developer archives you will see there is ongoing work and 
discussion about its future. The archiver will certainly be enhance by and 
maintained through MM 2.1 albeit the enhancements may not be that major. Do 
you do python? Maybe you could make a contribution!

> > The benefit of the integration of htdig with Mailman archives generated by
> > pipermail is that it provides per list search facilities with a search form
> > on each list's archive TOC page and uses Mailman's security mechanism for
> > limiting access to private mail archives via search responses; in fact you
> > can only access a private list archive's search form if you are authorised
> > to access the list. The patches also automatically builds htdig config
> > files for each archived list and provides cron scripts for maintaining
> > htdig's search indices.
>
>That's very important to limit access for private list archives.
>Actually, only students should have access to the mailing lists, and
>only for those courses they are enrolled in.

If you go with the external archiver I guess you will have to apply 
authentication and access control via the web server used to access the 
archives produced. You may want to consider how you can automate keeping 
the access control data for each private list's archives, in a format for 
use by the web server, and the subscription information held by MM in step.

As an aside, the htdig/MM integration I produced uses per list search forms 
embedded in the list archive TOC page in association with per list htdig 
config files and per list search indexes. The primary reason is that this 
gives user authentication before the search is done and inhibits 
unauthorised users having access to links and synopsis information which 
they are not entitled to access.

The approach I adopted helps overcome a problem with having search indexes 
that contain information about both private and public data. If you have 
this you have to do one of following:

1. if you are serious about security, use your own search script to run the 
search engine's search and then filter the results returned by it to remove 
links and their associated synopsis information which the user is not 
authorised to see. The problem with this is that if you have a large search 
space then checking all the returned results is going to be demanding of 
system performance.

2. if you don't mind if people can read the snippets of data they are not 
authorised to see in the synopsis returned in association with each link 
you let the user see all the results returned. Having aroused their 
interest you then annoy by refusing to let them follow one of the links 
that the search just returned to them.

My approach sidesteps both these issues reasonably neatly but I'm sure 
there are a dozen other ways of achieving the same objectives suing any 
combination of list manager/archive/search engine.

>I'll definitely try with your suggestion.
>
>Since Pipermail is no longer developed, do you think about some patch
>with external archivers like Mhonarc or Hypermail?

I'm looking at producing a more generalised patch to simply producing 
closer integrations of other search engines with Mailman archives. I guess 
it might be worth expanding my thinking to generalise to mail archives 
produced by other archivers and searching them with different search engines.

>Sincerely,
>Sasa