[Tutor] Extracting body of all email messages from an mbox file on computer
grishma govani
grishma20 at gmail.com
Thu Sep 11 10:22:52 CEST 2008
Yes, I used the part of the code from the second link.
I am using the mailbox modules too.
I have the e-mails from gmail in a file on my computer. I have used
the code below extract all the headers. As you can see for now I am
using text stored in document as my body. I just want to extract the
plain text and leave out all the html, duplicates of plain text and
all the other information like content type, from etc. Can anyone help
me out?
mb = mailbox.UnixMailbox(file('tmp/automated/Feedback', 'r'))
fout = file('Feedback.txt', 'w')
msg = mb.next()
while msg is not None:
document = msg.fp.read()
document = passthrough_filter(msg, document)
msg = mb.next()
def passthrough_filter(msg, document):
"""This prints the 'from' address of the message and
returns the document unchanged.
"""
from_addr = msg.getaddr('From')[0]
Sub = msg.get('Subject')
ContentType = msg.get('Content-Type')
ContentDisp = msg.get('Content-Disposition')
print "From:",from_addr
print "Subject:",Sub
print "Attachment:",None
print "Body:",document
print '\n'
return document
On 10 Sep 2008, at 22:09, Kent Johnson wrote:
> On Wed, Sep 10, 2008 at 4:06 PM, grishma govani
> <grishma20 at gmail.com> wrote:
>> Hello Everybody,
>>
>> I have been trying to extract the body of all the email messages
>> from an
>> mbox file.
>
> How are you doing this? Have you seen the mailbox module and this
> recipe:
> http://docs.python.org/lib/mailbox-mbox.html
> http://code.activestate.com/recipes/157437/
>
> Kent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20080911/f4117f18/attachment.htm>
More information about the Tutor
mailing list