Efficient grep using Python?

P at draigBrady.com P at draigBrady.com
Fri Dec 17 09:22:34 EST 2004


sf wrote:
> The point is that when you have 100,000s of records, this grep becomes
> really slow?

There are performance bugs with current versions of grep
and multibyte characters that are only getting addressed now.
To work around these do `export LANG=C` first.

In my experience grep is not scalable since it's O(n^2).
See below (note A and B are randomized versions of
/usr/share/dict/words (and therefore worst case for the
sort method)).

$ wc -l A B
   45427 A
   45427 B

$ export LANG=C

$ time grep -Fvf B A
real    0m0.437s

$ time sort A B B | uniq -u
real    0m0.262s

$ rpm -q grep coreutils
grep-2.5.1-16.1
coreutils-4.5.3-19

-- 
Pádraig Brady - http://www.pixelbeat.org
--



More information about the Python-list mailing list