Efficient scanning of mbox files
Martin Franklin
mfranklin1 at gatwick.westerngeco.slb.com
Mon Nov 11 07:06:27 EST 2002
On Mon, 2002-11-11 at 11:42, Moore, Paul wrote:
>
> def add_group(self, id, file):
> print "Opening file", file, "for group", id
> fp = open(file, "rb")
> posns = []
> oldpos = 0
> n = 0
> while 1:
> line = fp.readline()
> if not line: break
> if FROM_RE.match(line):
> n += 1
> posns.append(oldpos)
> oldpos = fp.tell()
> fp.close()
> posns.append(oldpos)
> print "Group", id, "- articles(posns) =", n, len(posns)
> self.groups[id] = (file, n, posns)
>
> --
> http://mail.python.org/mailman/listinfo/python-list
Paul,
I ran the above example on my Python folder (7000+ messages...)
it took 12 seconds to process. Then I changed the
if FROM_RE.match(line):
to
if line.startswith("From "):
And got a 2 second speed up....
Then I slurped the file into a cStringIO.StringIO object and got it down
to 5 seconds.....
Just some thoughts
Martin
More information about the Python-list
mailing list