
On 4 March 2013 18:50, Guido van Rossum <guido@python.org> wrote:
On Sun, Mar 3, 2013 at 6:31 PM, Robert Collins <robertc@robertcollins.net> wrote:
On 4 March 2013 05:16, Benjamin Peterson <benjamin@python.org> wrote:
Robert Collins <robertc@...> writes:
There doesn't seem to be a way to safely use detach() on stdin - I'd like to get down to the raw stream, but after calling detach(), the initial BufferedIOReader is unusable - so you cannot retrieve any buffered content) - and unless you detach(), you can't guarantee that the buffer will ever be empty.
Presumably if you call it before anyone else has had a chance to read from it, you should be okay.
Thats hard to guarantee in the general case: consider a library utility that accepts an input stream. To make it concrete, consider dispatching to different processors based on the first few bytes of a stream: you'd have to force raw IO handling everywhere, rather than just the portion of code that needs it...
The solution would seem obvious: detach before reading anything from the stream.
But apparently you're trying to come up with a reason why that's not enough. I think you're concerned about the situation where you have a stream of uncertain origin, and you want to switch to raw, unbuffered I/O. You realize that some of the bytes you are interested in might already have been read into the buffer. So you want access to the contents of the buffer.
Yes exactly. A little more context on how I came to ask the question. I wanted to accumulate all input on an arbitrary stream within 5ms, without blocking for longer. Using raw IO + select, its possible to loop, reading one byte at a time. The io module doesn't have an API (that I could find) for putting an existing stream into non-blocking mode, so reading a larger amount and taking what is returned isn't viable. However, without raw I/O, select() will timeout because it consults the underlying file descriptor bypassing the buffer. So - the only reason to want raw I/O is to be able to use select reliably. An alternative would be being able to drain the buffer with no underlying I/O calls at all, then use select + read1, then rinse and repeat.
When the io module was originally designed, this was actually one of the (implied) use cases -- one reason I wanted to stop using C stdio was that I didn't like that there is no standard way to get at the data in the buffer, in similar use cases as you're trying to present. (A use case I could think of would be an http server that forks a subprocess after reading e.g. the first line of the http request, or perhaps after the headers.)
Thats a very similar case as it happens - protocol handling is present in my use case too.
It seems that the when the io module was rewritten in C for speed (and I am very grateful that it was, the Python version was way too slow) this use case, being pretty rare, was forgotten. In specific use cases it's usually easy enough to just open the file unbuffered, or detach before reading anything.
Can you write C code? If so, perhaps you can come up with a patch. Personally, I'm not sure that your proposed API (a buffered_only flag to read()) is the best way to go about it. Maybe detach() should return the remaining buffered data? (Perhaps only if a new flag is given.)
FWIW I think it's also possible that some of the data has made it into the text wrapper already, so you'll have to be able to extract it from there as well. (Good luck.)
I can write C code, and if evolving the API is acceptable (it sounds like it is) I'll be more than happy to make a patch. Some variations I can think of... The buffer_only flag I suggested, on read_into, read1, read etc. Have detach return the buffered data as you suggest - that would be incompatible unless we stash it on the raw object somewhere, or do something along those lines. A read0 - analogous to read1, returns data from the buffer, but guarantees no underlying calls. I think exposing the buffer more explicitly is a good principle, independent of whether we change detach or not.
-- --Guido van Rossum (python.org/~guido)
-- Robert Collins <rbtcollins@hp.com> Distinguished Technologist HP Cloud Services