
On Wed, 6 Jun 2001 18:29:23 -0400 barry@digicool.com (Barry A. Warsaw) wrote:
[I've moved this discussion over to mailman-developers. -BAW]
"JCL" == J C Lawrence <claw@2wire.com> writes:
JCL> Ability to unroll quoted printable.
JCL> Ability to unroll base64 encoded plain text.
JCL> Ability to strip blocks from message parts that match JCL> stated patterns (eg Yahoo/MSN/Hotmail ads, corporate CYA JCL> statements, etc).
JCL> Ability to filter on line length (eg hold for moderation or JCL> auto-discard/reject).
Some of this will be added to mimelib when I get a chance. On the ...
Also, just some quick thoughts on de-mime-ing, which also address some things that Chuq has brought up, re: regexp filtering, auto-discarding, etc.
I'm of the opinion that regular expression predicates alone aren't going to cut it, and that anything more complex is just way way too complicated to attempt to expose through an email or web interface. Complexity is already our enemy, IMHO.
So what I'm envisioning is an extensible architecture, a la the message pipelines, where each filter is implemented in a separate Python module, conforming to a particular, yet-to-be-defined API. Mailman will provide a bunch of canned defaults, like "strip-mime-leaving-only-text/plain" or "match-vbs-attachments". There will probably be some kind of mix-in model for describing the action to take when a filter module matches. ...
While a general and powerful mime handler would be nice, and is probably the right thing to do in terms of the long-term development, I think that one can get most of the benefit from a much simpler solution. A few months ago I hacked together a mime handler with the goal of making the stuff that comes from Outhouse, AOL, etc. look like plain-text mail, as well as enforcing prohibitions on postings images and other binaries. The handler is based on the mimetools library; it discards sections with certain mime types specified by per-list regexps, and removes multipart/* wrappers that become redundant after the stripping. Nothing fancy, but it has cleaned up 95% of the crap on our lists -- mostly text/html but also the occassional image/* or application/*. (Ripping out text/html works because so far it's always accompanied by corresponding text/plain, except for contributions from spammers, the deletion of which is a feature.)
I was going to post the patch, but haven't gotten around to upgrading from 2.0beta6 and porting the code.
I agree that the UI for list admins to define what to do with what, is likely to be the most challenging part of good general-purpose solution. The simple mime handler is useful, I think, because although there's an awful lot that can be done with mime encoding, the vast majority of email traffic these days comes from just a few MUAs and is very pedestrian, mime-wise. Having a good set of canned defaults corresponding to these common cases should work pretty well.
-les les@2pi.org