[Python-ideas] BufferedIO and detach

Guido van Rossum guido at python.org
Mon Mar 4 06:50:52 CET 2013


On Sun, Mar 3, 2013 at 6:31 PM, Robert Collins
<robertc at robertcollins.net> wrote:
> On 4 March 2013 05:16, Benjamin Peterson <benjamin at python.org> wrote:
>> Robert Collins <robertc at ...> writes:
>>
>>>
>>> There doesn't seem to be a way to safely use detach() on stdin - I'd
>>> like to get down to the raw stream, but after calling detach(), the
>>> initial BufferedIOReader is unusable - so you cannot retrieve any
>>> buffered content) - and unless you detach(), you can't guarantee that
>>> the buffer will ever be empty.
>>
>> Presumably if you call it before anyone else has had a chance to read from it,
>> you should be okay.
>
> Thats hard to guarantee in the general case: consider a library
> utility that accepts an input stream. To make it concrete, consider
> dispatching to different processors based on the first few bytes of a
> stream: you'd have to force raw IO handling everywhere, rather than
> just the portion of code that needs it...

The solution would seem obvious: detach before reading anything from the stream.

But apparently you're trying to come up with a reason why that's not
enough. I think you're concerned about the situation where you have a
stream of uncertain origin, and you want to switch to raw, unbuffered
I/O. You realize that some of the bytes you are interested in might
already have been read into the buffer. So you want access to the
contents of the buffer.

When the io module was originally designed, this was actually one of
the (implied) use cases -- one reason I wanted to stop using C stdio
was that I didn't like that there is no standard way to get at the
data in the buffer, in similar use cases as you're trying to present.
(A use case I could think of would be an http server that forks a
subprocess after reading e.g. the first line of the http request, or
perhaps after the headers.)

It seems that the when the io module was rewritten in C for speed (and
I am very grateful that it was, the Python version was way too slow)
this use case, being pretty rare, was forgotten. In specific use cases
it's usually easy enough to just open the file unbuffered, or detach
before reading anything.

Can you write C code? If so, perhaps you can come up with a patch.
Personally, I'm not sure that your proposed API (a buffered_only flag
to read()) is the best way to go about it. Maybe detach() should
return the remaining buffered data? (Perhaps only if a new flag is
given.)

FWIW I think it's also possible that some of the data has made it into
the text wrapper already, so you'll have to be able to extract it from
there as well. (Good luck.)

-- 
--Guido van Rossum (python.org/~guido)



More information about the Python-ideas mailing list