[Tutor] Fastest way to iterate through a file
kent37 at tds.net
Tue Jun 26 17:28:30 CEST 2007
Robert Hicks wrote:
> idList only has about 129 id numbers in it.
That is quite a few times to be searching each line of the file. Try
using a regular expression search instead, like this:
regex = re.compile('|'.join(idList))
for line in f2:
# process a hit
A simple test shows that to be about 25 times faster.
Searching for each of 100 id strings in another string:
In : import timeit
In : setup = "import re; import string; ids=[str(i) for i in
In : timeit.Timer('for i in ids: i in line', setup).timeit()
Build a regular expression to match all the ids and use that to search:
In : setup2=setup + ";regex=re.compile('|'.join(ids))"
In : timeit.Timer('regex.search(line)', setup2).timeit()
In : _10 / _12
> I am running it straight from a Linux console. I thought about buffering
> but I am not sure how Python handles that.
I don't think the console should be buffered.
> Do you know if Python has a "slower" startup time than Perl? That could
> be part of it though I suspect the buffering thing more.
I don't know if it is slower than Perl but it doesn't take a few seconds
on my computer. How long does it take you to get to the interpreter
prompt if you just start Python? You could put a simple print at the
start of your program to see when it starts executing.
More information about the Tutor