[Python-ideas] BufferedIO and detach

Robert Collins robertc at robertcollins.net
Mon Mar 4 10:19:06 CET 2013


On 4 March 2013 22:12, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Mon, Mar 4, 2013 at 4:44 PM, Robert Collins
> <robertc at robertcollins.net> wrote:
>> Some variations I can think of...
>>
>> The buffer_only flag I suggested, on read_into, read1, read etc.
>>
>> Have detach return the buffered data as you suggest - that would be
>> incompatible unless we stash it on the raw object somewhere, or do
>> something along those lines.
>>
>> A read0 - analogous to read1, returns data from the buffer, but
>> guarantees no underlying calls.
>>
>> I think exposing the buffer more explicitly is a good principle,
>> independent of whether we change detach or not.
>
> As Guido noted, you actually have multiple layers of buffering to
> contend with - for a text stream, you may have already decoded
> characters and partially decoded data in the codec's internal buffer,
> in addition to any data in the IO buffer. That's actually one of the
> interesting problems with supporting a "set_encoding()" method on IO
> streams (see http://bugs.python.org/issue15216).

Indeed. Fun! Caches are useful but add complexity :)

> How does the following API sound for your purposes? (this is based on
> what set_encoding() effectively has to do under the hood):
>
>     BufferedReader:
>
>         def push_data(binary_data):
>             """Prepends contents of 'data' to the internal buffer"""
>
>         def clear_buffer():
>             """Clears the internal buffer and returns the previous
> content as a bytes object"""
>
>     TextIOWrapper:
>
>         def push_data(char_data, binary_data=b""):
>             """Prepends contents of 'data' to the internal buffer. If
> binary_data is provided, it is pushed into the underlying IO buffered
> reader. Raises UnsupportedOperation if the underlying stream has no
> "push_data" method."""
>
>         def clear_buffer():
>             """Clears the internal buffers and returns the previous
> content as a (char_data, binary_data) pair. The binary data includes
> any data that was queued inside the codec, as well as the contents of
> the underlying IO buffer"""

That would make the story of 'get me back to raw IO' straightforward,
though the TextIOWrapper's clear_buffer semantics are a little unclear
to me from just the docstring. I think having TextIOWrapper only
return bytes from clear_buffer and only accept bytes in push_data
would be simpler to reason about, if a little more complex on the
internals.

Now, one could implement 'read0' manually using read1 + clear_buffer +
push_data:
# first, unwrap back to a bytes layer
buffer = textstream.buffer()
buffer.push_data(textstream.clear_buffer[1])
def read0(n):
    data = buffer.clear_buffer()
    result = data[:n]
    buffer.push_data(data[n:])
    return result

But it might be more efficient to define read0 directly on BufferedIOReader.

-Rob

-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Cloud Services



More information about the Python-ideas mailing list