[Email-SIG] fixing the current email module

Glenn Linderman v+python at g.nevcal.com
Fri Oct 9 00:50:37 CEST 2009


On approximately 10/8/2009 4:40 AM, came the following characters from 
the keyboard of Stephen J. Turnbull:
> Glenn Linderman writes:
>
>  > >  > If conversions are avoided, then octets are unlikely to be out of 
>  > >  > range?
>  > >
>  > > Haven't looked in your spam bucket recently, I guess.  Spammers
>  > > regularly put 8 bit characters into headers (and into bodies in
>  > > messages without a Content-Type header), for one thing.
>  > 
>  > I'm aware of that, but if conversions are not done, octets are unlikely 
>  > to be _reported_ to be out of range....
>
> Conversions will eventually be done.  "Best it were done quickly."
>   

Disagree.  Deferring the conversions defers failure issues to the point 
where the code (hopefully) somewhat understands the type of data being 
manipulated, and can then handle it appropriately.  Converting up front 
causes errors in things that may never be touched or needed, so the 
error detection and handling is wasteful.

>  > > Most clients are simply not going to be prepared for the kind of
>  > > crap I see in /var/mail/turnbull every day.
>  > 
>  > Are you referring to most email clients, or most 
>  > Python-email-library-using clients?
>
> Sorry.  When I mean "MUA" I try to say "MUA".  By "client", I'm
> referring to the higher level logic that is going to be calling the
> email module.
>   

Yeah, terminology between people that haven't discussed the topic before 
can slow communication.

So for headers, which are supposed to be ASCII, or encoded via RFC rules 
to ASCII (no 8-bit chars), then the discovery of an 8-bit char should be 
produce a defect report, but then simply converted to Unicode as if it 
were Latin-1 (since there is no other knowledge available that could 
produce a better conversion).  And if the result of that is not expected 
by the client (your definition), then the client should either notice 
the defect report and reject it based on that, or attempt to parse it, 
and reject it if it encounters unexpected syntax.  After all, this is, 
for that client, "raw user input" (albeit from a remote source) so fully 
error checking the input is appropriate.

>  > Is it your point of view, then, that incorrectly formed email should be 
>  > mostly treated as SPAM?
>
> Heavens no!  Not by the email module, anyway!  The email module should
> not know about spam (but see Barry's "we're having spam for Launchpad"
> post: if you're that good, anything goes!), except maybe at a very
> high level.
>   

I didn't think you'd think that, but things you were saying seemed to be 
implying that.

>  > Your "hit me with your best shot" comment indicates that you want a
>  > failure code or exception when the data is bad, and then a way to
>  > "retry accepting errors"?
>
> My curent thinking is that the email module should return an object
> representing a partial parse.  The way that you find out if it is
> partial is to try to access some data that "should" be in the object.
> If the parse succeeded, the accessor returns the data (which might be
> empty).  If the parse did not succeed, you get an AttributeError.
> (This is just a paraphrase of what I wrote in response to Oleg.)

yeah, or some error, anyway.

The problem with the APIs that are spelled __str__ and __bytes__ is that 
there is no other way to return errors other than exceptions.... the 
Python way.  Since the email library is trying to avoid raising 
exceptions in large blocks of its code, it is non-Pythonic (which is 
what Oleg is probably complaining about, in part).  But because it needs 
to avoid exceptions, and is therefore non-Pythonic, it may be 
inappropriate to spell very many of its APIs __str__ and __bytes__, 
because that is Pythonic, and requires exceptions.  Once you become 
non-Pythonic in one area, you may have to also be non-Pythonic in some 
other areas...

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking



More information about the Email-SIG mailing list