Standard module for parsing emails?
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Wed Jul 30 22:25:37 EDT 2008
On Wed, 30 Jul 2008 07:11:45 -0700, Phillip B Oldham wrote:
> Most clients use ">" which is easy to check for, but I've seen some
> which use "|" and some which *don't* quote at all. Its causing us
> nightmares in parsing responses to system-generated emails. I was hoping
> someone might've seen the problem previously and released some code.
My sympathies.
I've even seen clients that prefix new (unquoted) text with the quote
character ">".
Well, possibly it's not the mail client, but the user. Who knows?
I will sometimes quote text like this:
[quote]
Something quoted.
[end quote]
But I'm writing for a human audience, not for a program.
The simple answer is that you can catch 90% of cases by checking for ">",
and another 1% by checking for "|". If the email contains HTML, I have
found that quoted text is sometimes in another colour. As for the rest,
well, sometimes even human beings can't easily determine what's quoted
and what isn't. Good luck getting a program to do it.
(Percentages are plucked out of thin air. YMMV.)
--
Steven
More information about the Python-list
mailing list