Lazy "for line in f" ?

Duncan Booth duncan.booth at invalid.invalid
Mon Jul 23 12:18:55 CEST 2007


Alexandre Ferrieux <alexandre.ferrieux at gmail.com> wrote:

> On Jul 23, 10:33 am, Duncan Booth <duncan.bo... at invalid.invalid>
> wrote:
>>
>> The extra buffering means that iterating over a file is about 3 times
>> faster than repeatedly calling readline.
>>
>>     while 1:
>>         line = f.readline()
>>         if not line:
>>             break
>>
>>     for line in f:
>>         pass
>>
> 
> Surely you'll notice that the comparison is spoilt by the fact that
> the readline version needs an interpreted test each turn around.
> A more interesting test would be the C-implemented iterator, just
> calling fgets() (the thin layer policy) without extra 8k-blocking.
> 
No, I believe the comparison is perfectly fair. You need the extra test 
for the readline version whatever you do, and you don't need it for the 
iterator.

If you insist, you can add an identical 'if not line: break' into the 
iterator version as well: it adds another 10% onto the iterator runtime 
which is still nearly a factor of 3 faster than the readline version, 
but then you aren't comparing equivalent code.

Alternatively you can knock a chunk off the time for the readline loop 
by writing it as:

   while f.readline():
       pass

or even:

   read = f.readline
   while read():
       pass

which gets it down from 10.3 to 9.0 seconds. It's 'fair' in your book 
since it avoids all the extra interpreter overhead of attribute lookup 
and a separate test, but it does make it a touch hard to do anything 
useful with the actual data.

Whatever, the iterator makes the code both cleaner and faster. It is at 
the expense of not being suitable for interactive sessions, or in some 
cases pipes, but for those situations you can continue to use readline 
and the extra overhead in runtime will not likely be noticable.



More information about the Python-list mailing list