[New-bugs-announce] [issue12116] io.Buffer*.seek() doesn't seek if "seeking leaves us inside the current buffer"

STINNER Victor report at bugs.python.org
Thu May 19 18:04:46 CEST 2011


New submission from STINNER Victor <victor.stinner at haypocalc.com>:

Example:

with open("setup.py", "rb") as f:
    # read smaller than the file size to fill the readahead buffer
    f.read(1)
    # seek doesn't seek
    f.seek(0)
    print("f pos=", f.tell())
    print("f.raw pos=", f.raw.tell())

Output:

f pos= 0
f.raw pos= 4096

I expect f.raw.tell() to be 0.

Extract of Modules/_io/buffered.c:

    if (whence != 2 && self->readable) {
        Py_off_t current, avail;
        /* Check if seeking leaves us inside the current buffer,
           so as to return quickly if possible. Also, we needn't take the
           lock in this fast path.
           Don't know how to do that when whence == 2, though. */
        /* NOTE: RAW_TELL() can release the GIL but the object is in a stable
           state at this point. */
        current = RAW_TELL(self);
        avail = READAHEAD(self);
        printf("current=%"  PY_PRIdOFF ", avail=%"  PY_PRIdOFF "\n", current, avail);
        if (avail > 0) {
            Py_off_t offset;
            if (whence == 0)
                offset = target - (current - RAW_OFFSET(self));
            else
                offset = target;
            printf("offset=%"  PY_PRIdOFF "\n", offset);
            if (offset >= -self->pos && offset <= avail) {
                printf("NO SEEK!\n");
                self->pos += offset;
                return PyLong_FromOff_t(current - avail + offset);
            }
        }
    }

I found this weird behaviour when trying to understand why:

        with open("setup.py", 'rb') as f:
            encoding, lines = tokenize.detect_encoding(f.readline)
        with open("setup.py", 'r', encoding=encoding) as f:
            imp.load_module("setup", f, "setup.py", (".py", "r", imp.PY_SOURCE))

is different than:

        with tokenize.open("setup.py") as f:
            imp.load_module("setup", f, "setup.py", (".py", "r", imp.PY_SOURCE))

imp.load_module() clones the file using something like fd = os.dup(f.fileno()); clone = os.fdopen(fd, "r").

For tokenizer.open(), a workaround is to replace:
   buffer.seek(0)
by
   buffer.seek(0); buffer.raw.seek(0)

----------
components: IO
messages: 136296
nosy: haypo, pitrou
priority: normal
severity: normal
status: open
title: io.Buffer*.seek() doesn't seek if "seeking leaves us inside the current buffer"
versions: Python 3.3

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12116>
_______________________________________


More information about the New-bugs-announce mailing list