simple text file 'parsing' question
Sean Mc Grath
digitome at iol.ie
Sat Jun 19 05:05:53 EDT 1999
Unfortunately, Eudora's .mbx files are not consistent
in how a message starts. The sentinel string is always
the same. from memory it is something like
"From ???@@???"
The problem is that it sometimes occurs in the middle of a line.
As long as you allow the sentinel to occur anywhere on a line
and keep the bit to the left of the sentinel, you can skip from
their to the blank line -- it will be all headers.
BTW, two weeks ago a new programmer with two years
college joined my company. No experience in Python. No
experience in XML. Two weeks later he has:-
A python parser for Eudora .mbx mail archives that
uses rfc822.py to tease out the headers
An XML transformation script in Python
Used Python reporting scripts to gelp create
a DTD for rfc822 e-mail
The beginnings of a down-translate to Folio Views
in Python.
Does this language make programmers productive or what!!!!!
On Fri, 18 Jun 1999 23:44:11 -0700, "Phil Mayes"
<nospam at bitbucket.com> wrote:
>KP wrote in message <376B1AAC.19FE8BCE at mysolution.com>...
>>Here's my dilema: a directory filled (200+) with small emails. My goal
>>is to strip all the headers and combine them into one file. I can read
>>all the files just fine and write them all to one file, but I cannot
>>discern how to strip the headers. The answer must be very simple, yet
>>I cannot see it. Can anyone give a few pointers on how to do it, our
>>what module might be best? Thank you.
>>Ken
>
>
>A raw email always has a blank line between the header and the body.
>(To be pendantic, it should also have all its lines ending in CRLF.)
>So you can read it in and find the gap by looking for 2 EOLs:
>
>import string
>f = open('c:\\apps\\eudora\\in.mbx', 'r')
>all = f.read()
>x = string.find(all, '\n\n')
>body = all[x+2:]
># append body to output file
>--
>Phil Mayes pmayes AT olivebr DOT com
>
>
>
>
More information about the Python-list
mailing list