[New-bugs-announce] [issue33361] readline() + seek() on io.EncodedFile breaks next readline()

Diego Argueta report at bugs.python.org
Wed Apr 25 21:06:03 EDT 2018


New submission from Diego Argueta <diego.argueta at gmail.com>:

It appears that calling readline() on a codecs.EncodedFile stream breaks seeking and causes subsequent attempts to iterate over the lines or call readline() to backtrack and return already consumed lines.

A minimal example:

```
from __future__ import print_function

import codecs
import io


def run(stream):
    offset = stream.tell()
    try:
        stream.seek(0)
        header_row = stream.readline()
    finally:
        stream.seek(offset)

    print('Got header: %r' % header_row)

    if stream.tell() == 0:
        print('Skipping the header: %r' % stream.readline())

    for index, line in enumerate(stream, start=2):
        print('Line %d: %r' % (index, line))


b = io.BytesIO(u'a,b\r\n"asdf","jkl;"\r\n'.encode('utf-16-le'))
s = codecs.EncodedFile(b, 'utf-8', 'utf-16-le')

run(s)
```

Output:

```
Got header: 'a,b\r\n'
Skipping the header: '"asdf","jkl;"\r\n'    <-- this is line 2
Line 2: 'a,b\r\n'                           <-- this is line 1
Line 3: '"asdf","jkl;"\r\n'                 <-- now we're back to line 2
```

As you can see, the line being skipped is actually the second line, and when we try reading from the stream again, the iterator starts from the beginning of the file.

Even weirder, adding a second call to readline() to skip the second line shows it's going **backwards**:

```
Got header: 'a,b\r\n'
Skipping the header: '"asdf","jkl;"\r\n'    <-- this is actually line 2
Skipping the second line: 'a,b\r\n'         <-- this is line 1
Line 2: '"asdf","jkl;"\r\n'                 <-- this is now correct
```

The expected output shows that we got a header, skipped it, and then read one data line.

```
Got header: 'a,b'
Skipping the header: 'a,b\r\n'
Line 2: '"asdf","jkl;"\r\n'
```

I'm sure this is related to the implementation of readline() because if we change this:

```
header_row = stream.readline()
```

to this:

```
header_row = stream.read().splitlines()[0]
```

then we get the expected output. If on the other hand we comment out the seek() in the finally clause, we also get the expected output (minus the "skipping the header") code.

----------
components: IO, Library (Lib)
messages: 315768
nosy: da
priority: normal
severity: normal
status: open
title: readline() + seek() on io.EncodedFile breaks next readline()
type: behavior
versions: Python 2.7, Python 3.6

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue33361>
_______________________________________


More information about the New-bugs-announce mailing list