[Tutor] Fastest way to iterate through a file
Kent Johnson
kent37 at tds.net
Tue Jun 26 17:28:30 CEST 2007
Robert Hicks wrote:
> idList only has about 129 id numbers in it.
That is quite a few times to be searching each line of the file. Try
using a regular expression search instead, like this:
import re
regex = re.compile('|'.join(idList))
for line in f2:
if regex.search(line):
# process a hit
A simple test shows that to be about 25 times faster.
Searching for each of 100 id strings in another string:
In [6]: import timeit
In [9]: setup = "import re; import string; ids=[str(i) for i in
range(1000, 1100)];line=string.letters"
In [10]: timeit.Timer('for i in ids: i in line', setup).timeit()
Out[10]: 15.298269987106323
Build a regular expression to match all the ids and use that to search:
In [11]: setup2=setup + ";regex=re.compile('|'.join(ids))"
In [12]: timeit.Timer('regex.search(line)', setup2).timeit()
Out[12]: 0.58947491645812988
In [15]: _10 / _12
Out[15]: 25.95236804820507
> I am running it straight from a Linux console. I thought about buffering
> but I am not sure how Python handles that.
I don't think the console should be buffered.
> Do you know if Python has a "slower" startup time than Perl? That could
> be part of it though I suspect the buffering thing more.
I don't know if it is slower than Perl but it doesn't take a few seconds
on my computer. How long does it take you to get to the interpreter
prompt if you just start Python? You could put a simple print at the
start of your program to see when it starts executing.
Kent
More information about the Tutor
mailing list