[Mailman-Developers] Boilerplate and content filtering [was: Introduction and Project Discussion]

Stephen J. Turnbull stephen at xemacs.org
Mon Apr 15 06:31:55 CEST 2013


Sreyanth writes:

 > Also, I would like to hear more about : Boilerplate stripper AND Better
 > content-filtering / handling error messages.
 > ​Boilerplate stripping is trivial to understand. But, can anyone elaborate
 > on Better content-filtering / handling error messages?

But boilerplate stripping is not necessarily trivial to implement,
because it's not always clear what boilerplate is.  I think it might
be a good idea to save it off and provide a link rather than discard
it, which leads to interesting questions of storage, shared links for
true boilerplate (storage compression of repeatedly encountered text,
yes, but more important the link will turn purple so you don't need to
click on it in the next message from that user!), and user interface
in general.

Content filtering is mostly going to be about MIME handling: choice of
the appropriate text/* part and things like that, removing
images/video/etc where the list prohibits them, converting HTML/
wordprocessor attachments to plain text, removing MIME parts whose
Content-Type doesn't match filename or perhaps file(1) magic in the
content, etc.

I can also imagine content filtering (or scoring!) based on word
choice ("WTF" OK, spelling it out not :-).  Also content filtering
based on stripping out the quoted from top-posts and replacing them
with links (after checking that the quoted material is indeed
available in the archive!)  All coming with on/off options, at least
for those who remember the IBM 360saurus and other dinosaurs and still
prefer mail to web. :-)

Error messages (I think this means delivery status notifications (DSN)
from mail servers) are a similar kind of problem to text-based
filtering, though somewhat more stylized.

Steve


More information about the Mailman-Developers mailing list