Finding messages in huge mboxes
Miklós
nospam at nowhere.hu
Mon Feb 2 15:56:39 EST 2004
What about putting it into a database like MySQL? <pyWink>
Miklós
"Bastiaan Welmers" <haasje at welmers.net> wrote in message
news:401eb54c$0$315$e4fe514c at news.xs4all.nl...
> Hi,
>
> I wondered if anyone has ever met this same mbox issue.
>
> I'm having the following problem:
>
> I need find messages in huge mbox files (50MB or more).
> The following way is (of course?) not very usable:
>
> fp = open("mbox", "r")
> archive = mailbox.UnixMailbox(fp)
> i=0
> while i < message_number_needed:
> i+=1
> archive.next()
>
> needed_message = archive.next()
>
> Especially because I often need messages at the end
> of the MBOX file.
> So I tried the following (scanning messages backwards
> on found "From " lines with readline())
>
> i=0
> j=0
> while 1:
> i+=1
> fp.seek(-i, SEEK_TO_END=2)
> line = fp.readline()
> if not line:
> break
> if line[:5] == 'From ':
> j+=1
> if j == total_messages - message_number_needed:
> archive.seekp = fp.tell()
> message = archive.next()
> # message found
>
> But also seems to be slow and CPU consuming.
>
> Anyone who has a better idea?
>
> Regards,
>
> Bastiaan Welmers
More information about the Python-list
mailing list