Looking for lots of words in lots of files

Calvin Spealman ironfroggy at socialserve.com
Wed Jun 18 16:42:31 CEST 2008


Upload, wait, and google them.

Seriously tho, aside from using a real indexer, I would build a set  
of the words I'm looking for, and then loop over each file, looping  
over the words and doing quick checks for containment in the set. If  
so, add to a dict of file names to list of words found until the list  
hits 10 length. I don't think that would be a complicated solution  
and it shouldn't be terrible at performance.

If you need to run this more than once, use an indexer.

If you only need to use it once, use an indexer, so you learn how for  
next time.

On Jun 18, 2008, at 10:28 AM, brad wrote:

> Just wondering if anyone has ever solved this efficiently... not  
> looking for specific solutions tho... just ideas.
>
> I have one thousand words and one thousand files. I need to read  
> the files to see if some of the words are in the files. I can stop  
> reading a file once I find 10 of the words in it. It's easy for me  
> to do this with a few dozen words, but a thousand words is too  
> large for an RE and too inefficient to loop, etc. Any suggestions?
>
> Thanks
> --
> http://mail.python.org/mailman/listinfo/python-list




More information about the Python-list mailing list