[Tutor] Extracting body of all email messages from an mbox file on computer
Kent Johnson
kent37 at tds.net
Sun Sep 14 14:32:03 CEST 2008
On Thu, Sep 11, 2008 at 4:22 AM, grishma govani <grishma20 at gmail.com> wrote:
> I have the e-mails from gmail in a file on my computer. I have used the code
> below extract all the headers. As you can see for now I am using text stored
> in document as my body. I just want to extract the plain text and leave out
> all the html, duplicates of plain text and all the other information like
> content type, from etc. Can anyone help me out?
Here is a program that shows the contents of an mbox file. It shows
the subject of each message and the content-type and except from each
part of the message body. It works with both single and multipart
messages.
import mailbox
def showMbox(mboxPath):
box = mailbox.mbox(mboxPath)
for msg in box:
print msg['Subject']
showPayload(msg)
print
print '**********************************'
print
def showPayload(msg):
payload = msg.get_payload()
if msg.is_multipart():
div = ''
for subMsg in payload:
print div
showPayload(subMsg)
div = '------------------------------'
else:
print msg.get_content_type()
print payload[:200]
if __name__ == '__main__':
showMbox('/path/to/mbox'')
Kent
More information about the Tutor
mailing list