[Python-ideas] BufferedIO and detach
Robert Collins
robertc at robertcollins.net
Mon Mar 4 11:24:56 CET 2013
On 4 March 2013 22:52, Nick Coghlan <ncoghlan at gmail.com> wrote:
>
> I originally had it defined that way, but as Victor points out in the
> set_encoding issue, decoding is potentially lossy in the general case, so we
> can't reliably convert already decoded characters back to bytes. The
> appropriate way to handle that is going to be application specific, so I
> changed the proposed API to produce a (str, bytes) 2-tuple.
I don't quite follow - why would we need to convert decoded characters
to bytes? While it is lossy, we know the original bytes. If we keep
the original bytes around until their characters are out of the
buffer, there is no loss window - and the buffer size in TextIOWrapper
is quite small by default isn't it?
If we need to be strictly minimal then yes, I can see why your tweaked
API would be better. However - two bits of feedback : it should say
more clearly that there is no overlap between the text and binary
segments: any bytes that have been decoded are in the text segment and
only in the text segment.
push_data has a wart though, consider a TextIOWrapper with the following buffer:
text="foo"
binary=b"bar"
when you call push_data("quux", b"baz")
should you end up with
text="quuxfoo"
binary=b"bazbar"
or
text="quux" + b"baz".decode(self.encoding) + "foo"
binary=b"bar"
The latter is clearly the intent, but the docstring implies the former
behaviour. (The latter case does depend on the bytestring being
decodable on it's own when there is content in the text buffer - but
even a complex buffer that is a sequence of text or byte regions would
still have that requirement due to not being able to recode reliably).
-Rob
--
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Cloud Services
More information about the Python-ideas
mailing list