search speed

Stefan Behnel stefan_ml at behnel.de
Fri Jan 30 15:40:00 EST 2009


D'Arcy J.M. Cain wrote:
> On Fri, 30 Jan 2009 15:46:33 +0200
> Justin Wyer <justinwyer at gmail.com> wrote:
>> $ find <path_to_dirs_containing_files> -name "*" -exec grep -nH "LF01" {} \;
>> | cut -d ":" -f 1 | sort | uniq
> 
> I know this isn't a Unix group but please allow me to suggest instead;
> 
>   $ grep -lR LF01 <path_to_dirs_containing_files>

That's a very good advice. I had to pull some statistics from a couple of
log files recently some of which were gzip compressed. The obvious Python
program just eats your first CPU's cycles parsing data into strings while
the disk runs idle, but using the subprocess module to spawn a couple of
gzgrep's in parallel that find the relevant lines, and then using Python to
extract and aggregate the relevant information from them does the job in
no-time.

Stefan



More information about the Python-list mailing list