[Mailman-Developers] (More) pristine archives
Barry A. Warsaw
Wed, 28 Aug 2002 09:26:29 -0400
>>>>> "TK" == Tokio Kikuchi <firstname.lastname@example.org> writes:
>> An interesting issue came up today while we were playing with a
>> Bayesian spam classifier. Mailman's archives aren't very
>> clean. Messages are sent to the archiver after various
>> headering munging steps, including the adding of the List-*
>> headers and the Subject prefix.
TK> The headers are in the raw archive and not in the monthly (or
TK> quaterly, weekly) text format archive. I would rather stop
TK> publicizing the raw archive even if the other archives are
TK> public accessible. At least it should be configurable (in
Some headers are stripped before being added to the quarterly/weekly
mini-archive, but both see messages /after/ they've been munged.
(On the second point, I'll try to look at patch #594771. That would
see like a good opportunity to make raw archives optional.)
>> We still want to do some munging, e.g. for anonymous lists.
>> This tells me that we may want to move ToArchive up before
>> CookHeaders in the global pipeline.
TK> We use a modified version of mailman 2.0.x in Japan and we
TK> like a feature of adding numbers in the subject header. The
TK> users tend to reference articles by the number not by the
TK> archive URL. So, we want the archive to be munged.
That seems to be the concensus, i.e. the archive should reflect what
the members get. Makes sense -- if you want a more pristine archive,
you can interpose a tee to a file before the message gets to Mailman,
or you could add a different handler module. I'll leave things as is.
TK> BTW, I'm preparing a patch for numbering the subject prefix.
Cool. But this is likely a new feature that will have to wait until
after 2.1 final.