way to extract only the message from pop3

Tim Williams tim at tdw.net
Thu Apr 5 19:14:24 EDT 2007


On 05/04/07, Collin Stocks <collinstocks at gmail.com> wrote:

> On 3 Apr 2007 12:36:10 -0700, flit <superflit at gmail.com> wrote:
> > Hello All,
> >
> > Using poplib in python I can extract only the headers using the .top,
> > there is a way to extract only the message text without the headers?
> >

> so get two strings: only headers, and the whole message.
> find the length of the headers, and chop that off the beginning of the whole
> message:
>
> > message=whole_message[len(headers):None]
>

This way you have to perform 2 downloads,  the headers and the whole
message. Then join them both into strings and subtract one from the
other by slicing or other means.

(other means?   body = whole_message.replace(headers,'' )  or maybe not ! :)  )

The body starts at the first blank line after the Subject: header, in
practice this is the first blank line.   This is a good starting point
for something simple like my earlier suggestion:

   msg = '\r\n'.join( M.retr(i+1)[1] )    #  retrieve the email into string
   hdrs,body = msg.split('\r\n\r\n',1)    # split it into hdrs & body

If the original poster required the body to be seperated from the
headers (and I received a private reply from the OP to my original
post that suggested it probably was)  then splitting a joined whole
message at the first blank line is sufficient and only requires 1
download without using the email module

If the OP required just the text parts extracted from the message then
it gets a bit trickier, the email module is the way to go but not
quite how a previous poster used it.

Consider an email that routed through my (python) SMTP servers and
filters today,.

Content: ['text/plain', 'text/html', 'message/delivery-status',
'text/plain', 'text/plain', 'text/plain', 'unknown', 'message/rfc822',
'text/plain', 'text/html']

Is text/html  a text part or an html part for this exercise ?  :)

You need to walk the parts and use something like

# part.get_content_maintype() requires a further call
# to get_content_subtype() , so use
# part.get_content_type() instead.

required = ['text/plain', 'text/tab-separated-values']
for part in EMAIL_OBJ.walk():
    text_parts = []
    if part.get_content_type() in required:
         text_parts.append(part)

    print ('\r\n' + '='*76 +'\r\n').join(text_parts)
    # print all the text parts seperated by a line of '='
# end

Whether you use the email module or not, you need to join the
retrieved message into a string.  You can use \n   but if you plan to
push the text back out in an email  '\r\n' is required for the SMTP
sending part.  Your client may or may not convert \n to \r\n at
sending time :)

HTH :)

-- 

Tim Williams



More information about the Python-list mailing list