[Tutor] How to parse a mailing list thread?

Cameron Simpson cs at zip.com.au
Sun Sep 20 07:41:31 CEST 2015


On 19Sep2015 21:46, chandan kumar <chandankumar.093047 at gmail.com> wrote:
>I am looking for a python module which i can use to parse mailing thread
>and extract some information from it.
>
>Any pointer regarding that would be helpful.

You should describe where the email messages are stored. I'll presume you have 
obtained the messages.

Construct a Message object from each message text. See the email.message 
module:

  https://docs.python.org/3/library/email.message.html#module-email.message

Every message has a Message-ID: header which uniquely identifies it. Replies to 
that message have that id in the In_Reply-To: header. (If you're parsing usenet 
newsgroup messages, you want the References: header - personally I consult 
both.)

The complete specification of an email message is here:

  http://tools.ietf.org/html/rfc2822

and the email.message module (and the other email.* modules) makes most of it 
easily available. If you need to parse email addresses import the 
"getaddresses" function from the "email.utils" module.

Constuct a graph connecting messages with the replies. You're done!

Cheers,
Cameron Simpson <cs at zip.com.au>


More information about the Tutor mailing list