Efficient scanning of mbox files
Paul Moore
gustav at morpheus.demon.co.uk
Mon Nov 11 15:46:20 EST 2002
Martin Franklin <mfranklin1 at gatwick.westerngeco.slb.com> writes:
> I ran the above example on my Python folder (7000+ messages...)
> it took 12 seconds to process. Then I changed the
> if FROM_RE.match(line):
>
> to
>
> if line.startswith("From "):
Trouble is, I can't do this, as the mbox files I've got *don't*
reliably have lines starting with "From" in the message body quoted
with an initial ">" :-(
> Then I slurped the file into a cStringIO.StringIO object and got it down
> to 5 seconds.....
I'll have a look at slurping, though. I was worrying because I have a
mix of CRLF and LF line endings (some files have one, some the
other). I wasn't sure what effect that would have - but thinking about
it, as long as I read files in binary mode, seek offsets should be the
same as byte positions in the in-memory string, so things should be
OK.
Thanks for the suggestions,
Paul
--
This signature intentionally left blank
More information about the Python-list
mailing list