Lazy "for line in f" ?
Duncan Booth
duncan.booth at invalid.invalid
Mon Jul 23 06:18:55 EDT 2007
Alexandre Ferrieux <alexandre.ferrieux at gmail.com> wrote:
> On Jul 23, 10:33 am, Duncan Booth <duncan.bo... at invalid.invalid>
> wrote:
>>
>> The extra buffering means that iterating over a file is about 3 times
>> faster than repeatedly calling readline.
>>
>> while 1:
>> line = f.readline()
>> if not line:
>> break
>>
>> for line in f:
>> pass
>>
>
> Surely you'll notice that the comparison is spoilt by the fact that
> the readline version needs an interpreted test each turn around.
> A more interesting test would be the C-implemented iterator, just
> calling fgets() (the thin layer policy) without extra 8k-blocking.
>
No, I believe the comparison is perfectly fair. You need the extra test
for the readline version whatever you do, and you don't need it for the
iterator.
If you insist, you can add an identical 'if not line: break' into the
iterator version as well: it adds another 10% onto the iterator runtime
which is still nearly a factor of 3 faster than the readline version,
but then you aren't comparing equivalent code.
Alternatively you can knock a chunk off the time for the readline loop
by writing it as:
while f.readline():
pass
or even:
read = f.readline
while read():
pass
which gets it down from 10.3 to 9.0 seconds. It's 'fair' in your book
since it avoids all the extra interpreter overhead of attribute lookup
and a separate test, but it does make it a touch hard to do anything
useful with the actual data.
Whatever, the iterator makes the code both cleaner and faster. It is at
the expense of not being suitable for interactive sessions, or in some
cases pipes, but for those situations you can continue to use readline
and the extra overhead in runtime will not likely be noticable.
More information about the Python-list
mailing list