Efficient grep using Python?
P at draigBrady.com
P at draigBrady.com
Fri Dec 17 09:22:34 EST 2004
sf wrote:
> The point is that when you have 100,000s of records, this grep becomes
> really slow?
There are performance bugs with current versions of grep
and multibyte characters that are only getting addressed now.
To work around these do `export LANG=C` first.
In my experience grep is not scalable since it's O(n^2).
See below (note A and B are randomized versions of
/usr/share/dict/words (and therefore worst case for the
sort method)).
$ wc -l A B
45427 A
45427 B
$ export LANG=C
$ time grep -Fvf B A
real 0m0.437s
$ time sort A B B | uniq -u
real 0m0.262s
$ rpm -q grep coreutils
grep-2.5.1-16.1
coreutils-4.5.3-19
--
Pádraig Brady - http://www.pixelbeat.org
--
More information about the Python-list
mailing list