Parsing Email Headers

MRAB python at mrabarnett.plus.com
Thu Mar 11 18:06:40 EST 2010


T wrote:
> On Mar 11, 3:13 pm, MRAB <pyt... at mrabarnett.plus.com> wrote:
>> T wrote:
>>> All I'm looking to do is to download messages from a POP account and
>>> retrieve the sender and subject from their headers.  Right now I'm 95%
>>> of the way there, except I can't seem to figure out how to *just* get
>>> the headers.  Problem is, certain email clients also include headers
>>> in the message body (i.e. if you're replying to a message), and these
>>> are all picked up as additional senders/subjects.  So, I want to avoid
>>> processing anything from the message body.
>>> Here's a sample of what I have:
>>>                 # For each line in message
>>>                 for j in M.retr(i+1)[1]:
>>>                     # Create email message object from returned string
>>>                     emailMessage = email.message_from_string(j)
>>>                     # Get fields
>>>                     fields = emailMessage.keys()
>>>                     # If email contains "From" field
>>>                     if emailMessage.has_key("From"):
>>>                         # Get contents of From field
>>>                         from_field = emailMessage.__getitem__("From")
>>> I also tried using the following, but got the same results:
>>>                  emailMessage =
>>> email.Parser.HeaderParser().parsestr(j, headersonly=True)
>>> Any help would be appreciated!
>> If you're using poplib then use ".top" instead of ".retr".
> 
> I'm still having the same issue, even with .top.  Am I missing
> something?
> 
>                 for j in M.top(i+1, 0)[1]:
>                     emailMessage = email.message_from_string(j)
>                     #emailMessage =
> email.Parser.HeaderParser().parsestr(j, headersonly=True)
>                     # Get fields
>                     fields = emailMessage.keys()
>                     # If email contains "From" field
>                     if emailMessage.has_key("From"):
>                         # Get contents of From field
>                         from_field = emailMessage.__getitem__("From")
> 
> Is there another way I should be using to retrieve only the headers
> (not those in the body)?

The documentation does say:

   """unfortunately, TOP is poorly specified in the RFCs and is
frequently broken in off-brand servers."""

All I can say is that it works for me with my ISP! :-)



More information about the Python-list mailing list