iterating over the lines of a file - difference between Python 2.7 and 3?

Wolfgang Maier wolfgang.maier at biologie.uni-freiburg.de
Thu Jan 17 17:38:30 CET 2013


Thanks Peter,
for this very helpful reply and for pointing out _pyio.py to me!
It's great to be able to check implementation details sometimes. So, if I understand you correctly, I can simply
import io
and open files with io.open() - instead of open and although this is a bit a detour in Python3 - and this will ensure version-independent behavior of my code? That´s cool!
What will my IO object return then when I read from it in Python 2.7? str where Python3 gives bytes, and unicode instead of str ? This is what I understood from the Python 2.7 io module doc.


-----Original Message-----
From: Peter Otten [mailto:__peter__ at web.de] 
Sent: Thursday, January 17, 2013 1:04 PM
To: python-list at python.org
Subject: Re: iterating over the lines of a file - difference between Python 2.7 and 3?

You can get the Python 3 behaviour with io.open() in Python 2.7. There is an implementation in Python in _pyio.py:

    def tell(self):
        return _BufferedIOMixin.tell(self) - len(self._read_buf) + self._read_pos



Wolfgang Maier wrote:

> I just came across an unexpected behavior in Python 3.3, which has to 
> do with file iterators and their interplay with other methods of 
> file/IO class methods, like readline() and tell(): Basically, I got 
> used to the fact that it is a bad idea to mix them because the 
> iterator would use that hidden read-ahead buffer, so what you got with 
> subsequent calls to
> readline() or tell() was what was beyond that buffer, but not the next 
> thing after what the iterator just returned.
> 
> Example:
> 
> in_file_object=open(‘some_file’,’rb’)
> 
> for line in in_file_object:
> 
>                 print (line)
> 
>                 if in_file_object.tell() > 300:
> 
>                                # assuming that individual lines are
>                                # shorter
> 
>                                break
> 
>  
> 
> This wouldn´t print anything in Python 2.7 since next(in_file_object) 
> would read ahead beyond the 300 position immediately, as evidenced by 
> a subsequent call to in_file_object.tell() (returning 8192 on my system).
> 
> However, I find that under Python 3.3 this same code works: it prints 
> some lines from my file and after completing in_file_object.tell() 
> returns a quite reasonable 314 as the current position in the file.
> 
> I couldn´t find this difference anywhere in the documentation. Is the 
> 3.3 behavior official, and if so, when was it introduced and how is it 
> implemented? I assume the read-ahead buffer still exists?
>
> By the way, the 3.3 behavior only works in binary mode. In text mode, the
> code will raise an OSError:  telling position disabled by next() call. In
> Python 2.7 there was no difference between the binary and text mode
> behavior. Could not find this documented either.







More information about the Python-list mailing list