[Email-SIG] I miss size() (and some latest frustration)

Glenn Linderman v+python at g.nevcal.com
Thu Mar 24 23:54:48 CET 2011


On 3/24/2011 2:41 PM, Barry Warsaw wrote:
> On Mar 24, 2011, at 05:10 PM, Steffen Daode Nurpmeso wrote:
>
>> It would be great if the message (file) size would also be
>> provided as a public method, so that code-flow decisions can be
>> made dependend upon the plain size of a message.
>> (The size is known without parsing for many real-life message
>> objects anyway or can be detected *cheap*.  True, e.g., for
>> all Message objects which are created by mailbox.py.)
> Certainly the normal FeedParser will see every byte of the message, even if it
> does save parts of it on disk.  Mailman 3's LMTP server also sees every byte
> and tucks the size away on an .original_size attribute of its Message
> subclass.
>
> But how would you handle it when you are creating the message yourself?  I
> think there are too many places you'd have to hook to get an accurate reading,
> or you'd have to essentially serialize it via a generator before you'd know,
> so it's less than helpful.
>
> It may indeed be possible to ask some external process what the size of the
> message is, but it would likely be a hint you couldn't necessarily trust.
> (I.e. the server might only have an approximate size.)
>
> So, I'm not sure whether the email package can have a consistent notion of a
> message's 'size'.  Perhaps though it ought to define an attribute for when the
> message is created by a parser, but let it be writable so that e.g. your
> application could get it from an IMAP server or whatever, and stick it in the
> attribute.

When created by a parser, it could have the notion of size-seen-so-far, 
or bytes-fed.  Once the whole message has been processed, the size of 
the message would be known, as well as of each piece.

Incomplete messages, such as those from IMAP servers for which only 
partial requests have been made for pieces, could only get the concept 
of "total size" from the server, if it provides it.  Since POP servers 
do, I think IMAP would also, but I'm not an IMAP expert.

>> It's also so unfortunate that 'headersonly' of Parser is in fact treated as
>> "a backwards compatibility hack", effectively consuming the entire input
>> nonetheless.  And *DesignThoughts* treats lazy parsing/partial loading as an
>> "interesting idea" only, though i can think about many cases where it is a
>> good thing to parse a Message{Headers[/Part/Part/Part...]}  sequentially.
>>
>> E.g., why should a spam detector load an entire message if it only wants to
>> check addresses against some white-/blacklists and simply throw away bad
>> hits.  Even more, why should a companies dispatcher read all the content if
>> it's only about to rewrite addresses and dispatch the mail to some other
>> internal server.  (Of course - hey, it's you, you know *such* more about this
>> stuff than i do.)
> Do you have suggestions for how the email package can help with these use
> cases?  Do you have specific API or implementation proposals?

For message parsing, it seems like allowing registered callbacks for 
various pieces would be handy... "Call me when you parse this type of a 
header" (or body part, etc.).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/email-sig/attachments/20110324/6b84db9d/attachment.html>


More information about the Email-SIG mailing list