Python:Email and Header Parsing: Some Help
David M. Cooke
cookedm+news at physics.mcmaster.ca
Thu Feb 26 22:40:55 CET 2004
At some point, dont bother <dontbotherworld at yahoo.com> wrote:
> I have written this small piece of code. I am a brand
> new player of Python. I had asked some people for
> help, unfortunately not many helped.
> Here is the code I have:
> import email
> import os
> import sys
> fread = open('email_message', 'r')
> print msg
> #fwrite = open('output','w')
> This way I am able to print the entire email message
> on the stdout. The program generates an error If I try
> to write the output to a file-- It says the argument
> (here msg) should be a string but not as an instance
> like here. How to write the message to another file
msg here isn't a string; it's an email.Message object. The print
statement works because print call str() on the objects passed.
fwrite = open('output', 'w')
fwrite.write( msg.as_string() )
I didn't use str(msg) here, as that defaults to
msg.as_string(unixfrom=True). Depends whether or not you want the
'From <whoosit>' line at the top (which you do if you're writing an
> 2. I have so many headers in the email message
> X Received:
> X Priority:
> etc etc.
> I want to parse the headers separtely and message
> separately. Does anyone has an example code to deal
> with Parser?
I'm not sure what you want -- email.message_from_file produces a Message
object, which already splits out the headers from the body. You can
then iterate over the headers. For example, to strip out the optional
headers (those starting with 'X-'):
for hdr in msg.keys():
> Also I want to remove the redundant words and all html
> tags. Any advise on that?
> I saw some examples using HTMLGen But I dont have
> HTMLGen with python on my machine. I have Python
> 2.3.3. on my machine.
HTMLGen won't work, as that generates HTML (hence the name...). To
strip out the HTML tags, probably a regular expression would be
sufficient. Otherwise, have a look at HTMLParser (in the standard library).
|David M. Cooke
More information about the Python-list