[Email-SIG] fixing the current email module

Glenn Linderman v+python at g.nevcal.com
Wed Oct 7 19:34:05 CEST 2009


On approximately 10/7/2009 3:33 AM, came the following characters from 
the keyboard of Stephen J. Turnbull:
> Glenn Linderman writes:
>
>  > > If you mean that the email module will keep track of what form the
>  > > object is currently represented by, that will eventually result in
>  > > "UnicodeError: octet out of range: 161, ascii".
>  > 
>  > The above sentence does not communicate your meaning to me... or any 
>  > meaning, actually.  Can you explain?
>
> Yes, that Unicode error is one that took years for Mailman to work
> around.  If we are going to be converting different objects at
> different times, I'm sure we'll get to see it agin in the future.  Oh,
> joy.
>   

Ah, a historical remark!  So that's why it was lost on me, I'm new to 
the Python world (but programming since 1975...)


>  > If conversions are avoided, then octets are unlikely to be out of 
>  > range?
>
> Haven't looked in your spam bucket recently, I guess.  Spammers
> regularly put 8 bit characters into headers (and into bodies in
> messages without a Content-Type header), for one thing.
>   

I'm aware of that, but if conversions are not done, octets are unlikely 
to be _reported_ to be out of range....


>  > And the email module must be aware of the form of the data in 
>  > order to manipulate it in any format other than wire format, but 
>  > fortunately, wire format declares the format of the data (not to say 
>  > there is not buggy wire format data -- but that is an issue best avoided 
>  > by avoiding as many conversions as possible).
>
> "Best" I can't speak to; you obviously are willing to accept a much
> higher error rate than I am.  "Robust" handling of buggy wire format
> data means that the email module must do something sane with it before
> giving it to the application.  Maybe it's reasonable to do that
> lazily, and/or cache the result, but access to bogus data (that the
> email module can determine is bogus or suspicious) must not be allowed
> unless the client says "hit me with your best shot" explicitly.  Most
> clients are simply not going to be prepared for the kind of crap I see
> in /var/mail/turnbull every day.
>   

Are you referring to most email clients, or most 
Python-email-library-using clients?  It seems like most email clients 
are being hit with the same stuff you are seeing... every day... and are 
handling it somehow... although anti-spam filters do eliminate some of 
it before the end user's MUA sees it, depending on the ISP, etc.

Is it your point of view, then, that incorrectly formed email should be 
mostly treated as SPAM?  Your paragraph above could be interpreted that 
way.  Oleg's point is also valid though, so it seems that isn't your 
point of view.

Your "hit me with your best shot" comment indicates that you want a 
failure code or exception when the data is bad, and then a way to "retry 
accepting errors"?


>  > I was pushing back from your declaration that an archiver would
>  > always want string output
>
> Please don't push back; we won't get anywhere.  Use cases are
> *examples*, not complete specifications of all possible inputs and
> outputs.  Use cases should be simple and clear cut.  If you want a
> different use case, state it.  In fact in the real world, *all* of the
> archivers I know of produce text formats on disk, either deleting
> multimedia objects or saving them off and linking to them via URLs in
> the text.  If you know of a different kind of archiver, add it as a
> use case.
>   

I misunderstood the purpose of your list.  Sure, everything in your list 
is a good example of real world uses.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking



More information about the Email-SIG mailing list