Building a word list from multiple files
Jeff Shannon
jeff at ccvcorp.com
Thu Nov 18 23:41:41 EST 2004
Manu wrote:
>hi,
>
>
>>1) How large are the files you are reading (e.g. can they
>>fit in memory)?
>>
>>
>
>The files are email messages.
>I will using the the builtin email module to extract only the content
>type which is plain text or in html.So no line by line processing is
>possible unless
>i write my own parser for email.
>
>
The email package can do that parsing for you -- it's not too difficult
to feed it a raw message file and get back only the text and/or html
payload.
>>If not, preprocess the files and use shelve to save a
>>dictionary that has already been processed. When you
>>
>>
>
>This is what i was planning to do.Once the processing is done for a
>set of files they are never processed again.I was going to store the
>dict as a string in a file and then use eval() to get it back.
>
>
Use the shelve module instead of eval()ing it yourself -- the shelve
authors have already done all of the hard work for you. It'll act
almost like a regular dictionary, but is extremely easy to save to disk
and reload later.
This is why Python is called "batteries included". :)
Jeff Shannon
Technician/Programmer
Credit International
More information about the Python-list
mailing list