[Mailman-Developers] (More) pristine archives

Barry A. Warsaw barry@zope.com
Wed, 28 Aug 2002 09:26:29 -0400

>>>>> "TK" == Tokio Kikuchi <tkikuchi@is.kochi-u.ac.jp> writes:

    >> An interesting issue came up today while we were playing with a
    >> Bayesian spam classifier.  Mailman's archives aren't very
    >> clean.  Messages are sent to the archiver after various
    >> headering munging steps, including the adding of the List-*
    >> headers and the Subject prefix.

    TK> The headers are in the raw archive and not in the monthly (or
    TK> quaterly, weekly) text format archive. I would rather stop
    TK> publicizing the raw archive even if the other archives are
    TK> public accessible. At least it should be configurable (in
    TK> mm_cfg).

Some headers are stripped before being added to the quarterly/weekly
mini-archive, but both see messages /after/ they've been munged.

(On the second point, I'll try to look at patch #594771.  That would
see like a good opportunity to make raw archives optional.)

    >> We still want to do some munging, e.g. for anonymous lists.
    >> This tells me that we may want to move ToArchive up before
    >> CookHeaders in the global pipeline.

    TK> We use a modified version of mailman 2.0.x in Japan and we
    TK> like a feature of adding numbers in the subject header. The
    TK> users tend to reference articles by the number not by the
    TK> archive URL.  So, we want the archive to be munged.

That seems to be the concensus, i.e. the archive should reflect what
the members get.  Makes sense -- if you want a more pristine archive,
you can interpose a tee to a file before the message gets to Mailman,
or you could add a different handler module.  I'll leave things as is.
    TK> BTW, I'm preparing a patch for numbering the subject prefix.

Cool.  But this is likely a new feature that will have to wait until
after 2.1 final.