[Python-ideas] BufferedIO and detach

Mon Mar 4 10:52:37 CET 2013

On 4 Mar 2013 19:19, "Robert Collins" <robertc at robertcollins.net> wrote:
>
> On 4 March 2013 22:12, Nick Coghlan <ncoghlan at gmail.com> wrote:
> > On Mon, Mar 4, 2013 at 4:44 PM, Robert Collins
> > <robertc at robertcollins.net> wrote:
> >> Some variations I can think of...
> >>
> >> The buffer_only flag I suggested, on read_into, read1, read etc.
> >>
> >> Have detach return the buffered data as you suggest - that would be
> >> incompatible unless we stash it on the raw object somewhere, or do
> >> something along those lines.
> >>
> >> A read0 - analogous to read1, returns data from the buffer, but
> >> guarantees no underlying calls.
> >>
> >> I think exposing the buffer more explicitly is a good principle,
> >> independent of whether we change detach or not.
> >
> > As Guido noted, you actually have multiple layers of buffering to
> > contend with - for a text stream, you may have already decoded
> > characters and partially decoded data in the codec's internal buffer,
> > in addition to any data in the IO buffer. That's actually one of the
> > interesting problems with supporting a "set_encoding()" method on IO
> > streams (see http://bugs.python.org/issue15216).
>
> Indeed. Fun! Caches are useful but add complexity :)
>
> > How does the following API sound for your purposes? (this is based on
> > what set_encoding() effectively has to do under the hood):
> >
> >     BufferedReader:
> >
> >         def push_data(binary_data):
> >             """Prepends contents of 'data' to the internal buffer"""
> >
> >         def clear_buffer():
> >             """Clears the internal buffer and returns the previous
> > content as a bytes object"""
> >
> >     TextIOWrapper:
> >
> >         def push_data(char_data, binary_data=b""):
> >             """Prepends contents of 'data' to the internal buffer. If
> > binary_data is provided, it is pushed into the underlying IO buffered
> > reader. Raises UnsupportedOperation if the underlying stream has no
> > "push_data" method."""
> >
> >         def clear_buffer():
> >             """Clears the internal buffers and returns the previous
> > content as a (char_data, binary_data) pair. The binary data includes
> > any data that was queued inside the codec, as well as the contents of
> > the underlying IO buffer"""
>
> That would make the story of 'get me back to raw IO' straightforward,
> though the TextIOWrapper's clear_buffer semantics are a little unclear
> to me from just the docstring. I think having TextIOWrapper only
> return bytes from clear_buffer and only accept bytes in push_data
> would be simpler to reason about, if a little more complex on the
> internals.

I originally had it defined that way, but as Victor points out in the
set_encoding issue, decoding is potentially lossy in the general case, so
we can't reliably convert already decoded characters back to bytes. The
appropriate way to handle that is going to be application specific, so I
changed the proposed API to produce a (str, bytes) 2-tuple.

Cheers,
Nick.

>
> Now, one could implement 'read0' manually using read1 + clear_buffer +
> push_data:
> # first, unwrap back to a bytes layer
> buffer = textstream.buffer()
> buffer.push_data(textstream.clear_buffer[1])
> def read0(n):
>     data = buffer.clear_buffer()
>     result = data[:n]
>     buffer.push_data(data[n:])
>     return result
>
> But it might be more efficient to define read0 directly on
BufferedIOReader.
>
> -Rob
>
> --
> Robert Collins <rbtcollins at hp.com>
> Distinguished Technologist
> HP Cloud Services
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130304/efd19f93/attachment.html>