Standard module for parsing emails?

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Wed Jul 30 22:25:37 EDT 2008


On Wed, 30 Jul 2008 07:11:45 -0700, Phillip B Oldham wrote:

> Most clients use ">" which is easy to check for, but I've seen some
> which use "|" and some which *don't* quote at all. Its causing us
> nightmares in parsing responses to system-generated emails. I was hoping
> someone might've seen the problem previously and released some code.

My sympathies.

I've even seen clients that prefix new (unquoted) text with the quote 
character ">".

Well, possibly it's not the mail client, but the user. Who knows?

I will sometimes quote text like this:

[quote]
Something quoted.
[end quote]

But I'm writing for a human audience, not for a program.

The simple answer is that you can catch 90% of cases by checking for ">", 
and another 1% by checking for "|". If the email contains HTML, I have 
found that quoted text is sometimes in another colour. As for the rest, 
well, sometimes even human beings can't easily determine what's quoted 
and what isn't. Good luck getting a program to do it.

(Percentages are plucked out of thin air. YMMV.)


-- 
Steven



More information about the Python-list mailing list