Re: [Mailman-Users] Re: MIME messages

[I've moved this discussion over to mailman-developers. -BAW]
"JCL" == J C Lawrence <claw@2wire.com> writes:
JCL> Ability to unroll quoted printable.
JCL> Ability to unroll base64 encoded plain text.
JCL> Ability to strip blocks from message parts that match
JCL> stated patterns (eg Yahoo/MSN/Hotmail ads, corporate CYA
JCL> statements, etc).
JCL> Ability to filter on line length (eg hold for moderation or
JCL> auto-discard/reject).
Some of this will be added to mimelib when I get a chance. On the plus side, Python 2.1 and the future 2.2 has some nice support for all this via its unicode codecs. I.e. in the Python CVS there's now a codec for quoted printable so you can essentially say something like:
subject = msg['subject'].decode('quopri')
if your subject header is quoted as per RFC 2047. I've only played around with this stuff a little bit, to get a feel for what you can do, so I haven't thought about APIs or doing the actual coding yet.
The downside is that I'm still targetting Mailman 2.1 at Python 2.0, so some of these features may not be available.
Also, just some quick thoughts on de-mime-ing, which also address some things that Chuq has brought up, re: regexp filtering, auto-discarding, etc.
I'm of the opinion that regular expression predicates alone aren't going to cut it, and that anything more complex is just way way too complicated to attempt to expose through an email or web interface. Complexity is already our enemy, IMHO.
So what I'm envisioning is an extensible architecture, a la the message pipelines, where each filter is implemented in a separate Python module, conforming to a particular, yet-to-be-defined API. Mailman will provide a bunch of canned defaults, like "strip-mime-leaving-only-text/plain" or "match-vbs-attachments". There will probably be some kind of mix-in model for describing the action to take when a filter module matches.
Then the admins can choose which filters they want, and what order they want to run them in. I'd actually envisioned something like this for the delivery pipeline, but that (giving admins control) has turned out to be not as necessary.
Even this may turn out to be too complex, so there may be yet another
level of abstraction that a site admin can glom together, so that a
list admin would be presented with a limited set of rhythms',
themes', or `styles' that they can pick and choose from. I've
thought about a similar mechanism for list themes, like "announce-only
list" or "read-only mirror list".
The point is that there's a lot we can do very cleanly at the Python level, but that the more configurability we expose at the web/email interface, the less usable it becomes, IMHO. So, I'm thinking seriously about ways to preserve the power Mailman can provide to the Python hacker while employing abstractions to reduce the cognitive load on the list administrators and users.
-Barry

On Wed, 6 Jun 2001 18:29:23 -0400 Barry A Warsaw <barry@digicool.com> wrote:
The point is that there's a lot we can do very cleanly at the Python level, but that the more configurability we expose at the web/email interface, the less usable it becomes, IMHO. So, I'm thinking seriously about ways to preserve the power Mailman can provide to the Python hacker while employing abstractions to reduce the cognitive load on the list administrators and users.
One problem with this approach is that you tend to require list admins to have shell access.
-- J C Lawrence claw@kanga.nu ---------(*) http://www.kanga.nu/~claw/ The pressure to survive and rhetoric may make strange bedfellows

"JCL" == J C Lawrence <claw@kanga.nu> writes:
>> The point is that there's a lot we can do very cleanly at the
>> Python level, but that the more configurability we expose at
>> the web/email interface, the less usable it becomes, IMHO. So,
>> I'm thinking seriously about ways to preserve the power Mailman
>> can provide to the Python hacker while employing abstractions
>> to reduce the cognitive load on the list administrators and
>> users.
JCL> One problem with this approach is that you tend to require
JCL> list admins to have shell access.
It does mean that list admins will be more dependent on the site admin for customizations beyond the canned themes. Maybe that won't be good enough.
If not, then an approach would be an onion-type interface, where normally the list admin is given a simplified UI, but can drill down to a more complicated interface if necessary. One thing I /don't/ think will be appropriate is to allow list admins to upload or write Python code without vetting from the site admin, because then we're in the realm of restricted execution, etc. Going down that road has serious security implications that will be difficult and time consuming to get right.
Of course, we /could/ target Zope as the platform, in which case we'd get a lot of that for free...
just-thinking-out-loud-ly y'rs, -Barry

On Thu, 7 Jun 2001 10:15:15 -0400 Barry A Warsaw <barry@digicool.com> wrote:
"JCL" == J C Lawrence <claw@kanga.nu> writes:
The point is that there's a lot we can do very cleanly at the Python level, but that the more configurability we expose at the web/email interface, the less usable it becomes, IMHO. So, I'm thinking seriously about ways to preserve the power Mailman can provide to the Python hacker while employing abstractions to reduce the cognitive load on the list administrators and users.
JCL> One problem with this approach is that you tend to require list JCL> admins to have shell access.
It does mean that list admins will be more dependent on the site admin for customizations beyond the canned themes. Maybe that won't be good enough.
I'd look at it the other way around: Mailman then implicitly requires site admins to be responsive to and handle list owner requests in regard to MIME handling configs. As a SiteAdmin, I'd look at that as an unwelcome chore that some bloody software went and created for me.
Consider a site like SourceForge: In one swell foop you've upped their SysAdm expenses significant;y.
If not, then an approach would be an onion-type interface, where normally the list admin is given a simplified UI, but can drill down to a more complicated interface if necessary.
I'd be tempted to do the following:
Mailman has a default set of configs that recognises various base types of MIME objects, text/html, text/plain, multiplart/alternative, etc etc etc. It then presents a simple list of these items each one with a set of four radio buttons:
- Ignore
- Strip from message
- Discard message if present
- Reject message if present
You then have a small text field in which the list admin can hand enter additional MIME types that he's interested in. Upon such entry they join the previously extant list of recognised MIME objects list and also gain the above four radio buttons.
A seperate section would then have the following two controls:
a) Flatten quoted printable? b) Flatten base64 encoded text/* parts?
One thing I /don't/ think will be appropriate is to allow list admins to upload or write Python code without vetting from the site admin, because then we're in the realm of restricted execution, etc.
Absolutely. That would be bad.
Of course, we /could/ target Zope as the platform, in which case we'd get a lot of that for free...
. o O ( <shudder>! )
J C Lawrence claw@kanga.nu ---------(*) http://www.kanga.nu/~claw/ The pressure to survive and rhetoric may make strange bedfellows

J C Lawrence <claw@kanga.nu> wrote:
Barry A Warsaw <barry@digicool.com> wrote: [..]
One thing I /don't/ think will be appropriate is to allow list admins to upload or write Python code without vetting from the site admin, because then we're in the realm of restricted execution, etc.
Absolutely. That would be bad.
Of course, we /could/ target Zope as the platform, in which case we'd get a lot of that for free...
. o O ( <shudder>! )
Please explain... what is bad about Zope?
Greetings, Norbert.
-- Norbert Bollow, Weidlistr.18, CH-8624 Gruet (near Zurich, Switzerland) Tel +41 1 972 20 59 Fax +41 1 972 20 69 nb@freedevelopers.net

On Wed, 6 Jun 2001 18:29:23 -0400 barry@digicool.com (Barry A. Warsaw) wrote:
[I've moved this discussion over to mailman-developers. -BAW]
"JCL" == J C Lawrence <claw@2wire.com> writes:
JCL> Ability to unroll quoted printable.
JCL> Ability to unroll base64 encoded plain text.
JCL> Ability to strip blocks from message parts that match JCL> stated patterns (eg Yahoo/MSN/Hotmail ads, corporate CYA JCL> statements, etc).
JCL> Ability to filter on line length (eg hold for moderation or JCL> auto-discard/reject).
Some of this will be added to mimelib when I get a chance. On the ...
Also, just some quick thoughts on de-mime-ing, which also address some things that Chuq has brought up, re: regexp filtering, auto-discarding, etc.
I'm of the opinion that regular expression predicates alone aren't going to cut it, and that anything more complex is just way way too complicated to attempt to expose through an email or web interface. Complexity is already our enemy, IMHO.
So what I'm envisioning is an extensible architecture, a la the message pipelines, where each filter is implemented in a separate Python module, conforming to a particular, yet-to-be-defined API. Mailman will provide a bunch of canned defaults, like "strip-mime-leaving-only-text/plain" or "match-vbs-attachments". There will probably be some kind of mix-in model for describing the action to take when a filter module matches. ...
While a general and powerful mime handler would be nice, and is probably the right thing to do in terms of the long-term development, I think that one can get most of the benefit from a much simpler solution. A few months ago I hacked together a mime handler with the goal of making the stuff that comes from Outhouse, AOL, etc. look like plain-text mail, as well as enforcing prohibitions on postings images and other binaries. The handler is based on the mimetools library; it discards sections with certain mime types specified by per-list regexps, and removes multipart/* wrappers that become redundant after the stripping. Nothing fancy, but it has cleaned up 95% of the crap on our lists -- mostly text/html but also the occassional image/* or application/*. (Ripping out text/html works because so far it's always accompanied by corresponding text/plain, except for contributions from spammers, the deletion of which is a feature.)
I was going to post the patch, but haven't gotten around to upgrading from 2.0beta6 and porting the code.
I agree that the UI for list admins to define what to do with what, is likely to be the most challenging part of good general-purpose solution. The simple mime handler is useful, I think, because although there's an awful lot that can be done with mime encoding, the vast majority of email traffic these days comes from just a few MUAs and is very pedestrian, mime-wise. Having a good set of canned defaults corresponding to these common cases should work pretty well.
-les les@2pi.org

On 6 Jun 01 at 22:55, Les Niles is alleged to have scribbled:
On Wed, 6 Jun 2001 18:29:23 -0400 barry@digicool.com (Barry A. Warsaw) wrote:
[I've moved this discussion over to mailman-developers. -BAW]
[]
So what I'm envisioning is an extensible architecture, a la the message pipelines, where each filter is implemented in a separate Python module, conforming to a particular, yet-to-be-defined API. Mailman will provide a bunch of canned defaults, like "strip-mime-leaving-only-text/plain" or "match-vbs-attachments". There will probably be some kind of mix-in model for describing the action to take when a filter module matches. ...
While a general and powerful mime handler would be nice, and is probably the right thing to do in terms of the long-term development, I think that one can get most of the benefit from a much simpler solution. A few months ago I hacked together a mime handler with the goal of making the stuff that comes from Outhouse, AOL, etc. look like plain-text mail, as well as enforcing prohibitions on postings images and other binaries. The handler is based on the mimetools library; it discards sections with certain mime types specified by per-list regexps, and removes multipart/* wrappers that become redundant after the stripping. Nothing fancy, but it has cleaned up 95% of the crap on our lists -- mostly text/html but also the occassional image/* or application/*. (Ripping out text/html works because so far it's always accompanied by corresponding text/plain, except for contributions from spammers, the deletion of which is a feature.)
this is exactly what I'm looking for (as a user)! (I started a thread on it yesterday) as to the UI, I see a simple set of options like ( ) keep text/html ( ) keep image/* ( ) keep application/* . .
as a starting point at least. the admin can select one or more of them. with all of them off, only text/plain is kept. if a message has no plain text portion, it is held for the moderator.
maybe some thought can also be given to filtering out hoax virus warnings? I've put in a regexp on 'virus' in the subject which will stop postings for moderator approval, but as I said yesterday, the regexp matching is a bit limited in what it sends back to the user, being singularly uninformative. some configurability would be nice there.... as in, when regexp x is matched, send message x, when Y is matched, send message Y etc.... instead of the blanket 'suspicious header' thing (-:
[]
pretty well.
-les les@2pi.org
-- Living in South Africa Flying power Kites Chasing 3'6" gauge trains http://terrapin.ru.ac.za/satrain

"LN" == Les Niles <les@2pi.org> writes:
LN> I was going to post the patch, but haven't gotten around to
LN> upgrading from 2.0beta6 and porting the code.
Please do! I'm not quite ready to start addressing this issue in code yet, but it'll be nice to have a couple of approaches in the patch manager to look at when I do.
If we can get an 80/20 solution for MM2.1, I think that'd be a big win.
-Barry
participants (5)
-
barry@digicool.com
-
David A. Forsyth
-
J C Lawrence
-
Les Niles
-
Norbert Bollow