Finding messages in huge mboxes
Bastiaan Welmers
haasje at welmers.net
Mon Feb 2 15:37:05 EST 2004
Hi,
I wondered if anyone has ever met this same mbox issue.
I'm having the following problem:
I need find messages in huge mbox files (50MB or more).
The following way is (of course?) not very usable:
fp = open("mbox", "r")
archive = mailbox.UnixMailbox(fp)
i=0
while i < message_number_needed:
i+=1
archive.next()
needed_message = archive.next()
Especially because I often need messages at the end
of the MBOX file.
So I tried the following (scanning messages backwards
on found "From " lines with readline())
i=0
j=0
while 1:
i+=1
fp.seek(-i, SEEK_TO_END=2)
line = fp.readline()
if not line:
break
if line[:5] == 'From ':
j+=1
if j == total_messages - message_number_needed:
archive.seekp = fp.tell()
message = archive.next()
# message found
But also seems to be slow and CPU consuming.
Anyone who has a better idea?
Regards,
Bastiaan Welmers
More information about the Python-list
mailing list