iterating over the lines of a file - difference between Python 2.7 and 3?

Wolfgang Maier wolfgang.maier at biologie.uni-freiburg.de
Thu Jan 17 11:35:31 CET 2013


I just came across an unexpected behavior in Python 3.3, which has to do
with file iterators and their interplay with other methods of file/IO class
methods, like readline() and tell(): Basically, I got used to the fact that
it is a bad idea to mix them because the iterator would use that hidden
read-ahead buffer, so what you got with subsequent calls to readline() or
tell() was what was beyond that buffer, but not the next thing after what
the iterator just returned.

Example:

in_file_object=open(‘some_file’,’rb’)

for line in in_file_object:

                print (line)

                if in_file_object.tell() > 300:

                               # assuming that individual lines are shorter

                               break

 

This wouldn´t print anything in Python 2.7 since next(in_file_object) would
read ahead beyond the 300 position immediately, as evidenced by a subsequent
call to in_file_object.tell() (returning 8192 on my system).

However, I find that under Python 3.3 this same code works: it prints some
lines from my file and after completing in_file_object.tell() returns a
quite reasonable 314 as the current position in the file.

I couldn´t find this difference anywhere in the documentation. Is the 3.3
behavior official, and if so, when was it introduced and how is it
implemented? I assume the read-ahead buffer still exists?

 

By the way, the 3.3 behavior only works in binary mode. In text mode, the
code will raise an OSError:  telling position disabled by next() call. In
Python 2.7 there was no difference between the binary and text mode
behavior. Could not find this documented either.

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20130117/73628d2a/attachment.html>


More information about the Python-list mailing list