The email package and KLEZ mails

Wed May 29 22:17:23 EDT 2002

[Sheila King]
> [Anthony Baxter]
> > [François Pinard]

> > > In my experience, incorrect MIME structure is one of the numerous
> > > hints about mail being SPAM.  I do not remember a single false positive.

> > I wish. I have to deal with end-user email, and trust me, it's not all
> > spam.

> I concur with Anthony.  I have written an email filter package using the
> email module and if you use the strict Parser class included in that
> module, it does throw away too much good email (because any good mail
> thrown away is too much).

Maybe the `email' package is stricter than the various MIME processing
tools that were in Python 1.5.2 in still exist in more recent versions,
but I would be tempted to think they are of comparable strictness.  I do
not really know.

The proverb ways that "alike people get together", it might explain why
I do not see more problems: most of my correspondents have mailer agents
which do a fair job at MIME generation.  And when MIME mistakes happens,
it is usually sufficient to raise the subject with my correspondents,
who are usually happy to get the problem solved at their end.

Often (but not necessarily), badly structured messages come from people
who do not care much.  Otherwise, they would have set up themselves better.
As I much prefer people who care, from my viewpoint, there is a significant
correlation between a message being MIME-erroneous and a message not being
worth much interest.

> Moreover, as I've mentioned in other posts and email correspondence,
> if you're writing software for end users, you really can't just
> tell them: "Oh, all those mails that caused errors...they were just
> non-RFC compliant. Probably SPAM or virus."

If you are writing filters for everybody, you are probably right.  When I
write filters for my friends or for myself, in my experience, careless
MIME may be filtered out as SPAM, and we do not loose much in practice :-).

> Secondly, why is it that the three other mail readers I use (Agent,
> Pegasus, and PocoMail) are all able to parse these messages?  I also
> agree with the idea that applications must be strict in what they write
> and liberal in what they accept.

This is a good principle, but only when kept within reasonable bounds.
Users should be on the side of being strict, and applications should be on
the side of being liberal.  Users might suffer uselessly by being overly
ascetic, applications might miss their goal through unlimited friendliness.

For example, I expect compilers to raise diagnostics and help me at being
strict, because being overly liberal for a compiler is just not helpful.
Another example, a sad one, is the messy state of HTML all around us,
it comes from browsers having been by far too liberal, and for too long.

If mailer agents are very lenient to MIME mis-formatting, they actively
prevent progress.  They do not really help it, as they trigger confusion.
Moreover, by implementing MIME poorly, they throw discredit on a good idea.
MIME standards are not that hard to read, you know.  It is a mystery to
me why some mail agents mangle the MIME they generate, or miss to assemble
it conveniently, in the spirit of the standards, at presentation time.

> I have written a "smart parser" class that I am using in my email
> filter. I use this class instead of the Parser class provided with the
> email module.  I provide the code below for all interested parties.
> [...]  Code follows the signature. Enjoy,

I'm saving it for possible later use!  Thanks for providing this...

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard