Looking for lots of words in lots of files

Kris Kennaway kris at FreeBSD.org
Wed Jun 18 11:01:38 EDT 2008


Calvin Spealman wrote:
> Upload, wait, and google them.
> 
> Seriously tho, aside from using a real indexer, I would build a set of 
> the words I'm looking for, and then loop over each file, looping over 
> the words and doing quick checks for containment in the set. If so, add 
> to a dict of file names to list of words found until the list hits 10 
> length. I don't think that would be a complicated solution and it 
> shouldn't be terrible at performance.
> 
> If you need to run this more than once, use an indexer.
> 
> If you only need to use it once, use an indexer, so you learn how for 
> next time.

If you can't use an indexer, and performance matters, evaluate using 
grep and a shell script.  Seriously.

grep is a couple of orders of magnitude faster at pattern matching 
strings in files (and especially regexps) than python is.  Even if you 
are invoking grep multiple times it is still likely to be faster than a 
"maximally efficient" single pass over the file in python.  This 
realization was disappointing to me :)

Kris



More information about the Python-list mailing list