[Python-ideas] BufferedIO and detach
Robert Collins
robertc at robertcollins.net
Mon Mar 4 10:19:06 CET 2013
On 4 March 2013 22:12, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Mon, Mar 4, 2013 at 4:44 PM, Robert Collins
> <robertc at robertcollins.net> wrote:
>> Some variations I can think of...
>>
>> The buffer_only flag I suggested, on read_into, read1, read etc.
>>
>> Have detach return the buffered data as you suggest - that would be
>> incompatible unless we stash it on the raw object somewhere, or do
>> something along those lines.
>>
>> A read0 - analogous to read1, returns data from the buffer, but
>> guarantees no underlying calls.
>>
>> I think exposing the buffer more explicitly is a good principle,
>> independent of whether we change detach or not.
>
> As Guido noted, you actually have multiple layers of buffering to
> contend with - for a text stream, you may have already decoded
> characters and partially decoded data in the codec's internal buffer,
> in addition to any data in the IO buffer. That's actually one of the
> interesting problems with supporting a "set_encoding()" method on IO
> streams (see http://bugs.python.org/issue15216).
Indeed. Fun! Caches are useful but add complexity :)
> How does the following API sound for your purposes? (this is based on
> what set_encoding() effectively has to do under the hood):
>
> BufferedReader:
>
> def push_data(binary_data):
> """Prepends contents of 'data' to the internal buffer"""
>
> def clear_buffer():
> """Clears the internal buffer and returns the previous
> content as a bytes object"""
>
> TextIOWrapper:
>
> def push_data(char_data, binary_data=b""):
> """Prepends contents of 'data' to the internal buffer. If
> binary_data is provided, it is pushed into the underlying IO buffered
> reader. Raises UnsupportedOperation if the underlying stream has no
> "push_data" method."""
>
> def clear_buffer():
> """Clears the internal buffers and returns the previous
> content as a (char_data, binary_data) pair. The binary data includes
> any data that was queued inside the codec, as well as the contents of
> the underlying IO buffer"""
That would make the story of 'get me back to raw IO' straightforward,
though the TextIOWrapper's clear_buffer semantics are a little unclear
to me from just the docstring. I think having TextIOWrapper only
return bytes from clear_buffer and only accept bytes in push_data
would be simpler to reason about, if a little more complex on the
internals.
Now, one could implement 'read0' manually using read1 + clear_buffer +
push_data:
# first, unwrap back to a bytes layer
buffer = textstream.buffer()
buffer.push_data(textstream.clear_buffer[1])
def read0(n):
data = buffer.clear_buffer()
result = data[:n]
buffer.push_data(data[n:])
return result
But it might be more efficient to define read0 directly on BufferedIOReader.
-Rob
--
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Cloud Services
More information about the Python-ideas
mailing list