[Tutor] Fastest way to iterate through a file

Tue Jun 26 17:28:30 CEST 2007

Robert Hicks wrote:
> idList only has about 129 id numbers in it.

That is quite a few times to be searching each line of the file. Try 
using a regular expression search instead, like this:

import re
regex = re.compile('|'.join(idList))
for line in f2:
   if regex.search(line):
     # process a hit

A simple test shows that to be about 25 times faster.

Searching for each of 100 id strings in another string:
In [6]: import timeit
In [9]: setup = "import re; import string; ids=[str(i) for i in 
range(1000, 1100)];line=string.letters"
In [10]: timeit.Timer('for i in ids: i in line', setup).timeit()
Out[10]: 15.298269987106323

Build a regular expression to match all the ids and use that to search:
In [11]: setup2=setup + ";regex=re.compile('|'.join(ids))"
In [12]: timeit.Timer('regex.search(line)', setup2).timeit()
Out[12]: 0.58947491645812988

In [15]: _10 / _12
Out[15]: 25.95236804820507

> I am running it straight from a Linux console. I thought about buffering 
> but I am not sure how Python handles that.

I don't think the console should be buffered.

> Do you know if Python has a "slower" startup time than Perl? That could 
> be part of it though I suspect the buffering thing more.

I don't know if it is slower than Perl but it doesn't take a few seconds 
on my computer. How long does it take you to get to the interpreter 
prompt if you just start Python? You could put a simple print at the 
start of your program to see when it starts executing.

Kent