[Python-3000] Draft PEP for New IO system

Sun Mar 4 23:42:17 CET 2007

On 3/4/07, Daniel Stutzbach <daniel.stutzbach at gmail.com> wrote:
> On 3/1/07, Adam Olsen <rhamph at gmail.com> wrote:
> > Why do non-blocking operations need to use the same methods when
> > they're clearly not the same semantics?  Although long,
> > .nonblockflush() would be explicit and allow .flush() to still block.
>
> .nonblockflush() would be fine with me, but I don't think .flush()
> should block on a non-blocking object.  To accomplish that, it would
> either have to be smart enough to switch the object into blocking
> mode, or internally use select().

Either would work, if we decide to support it.

> How about .flush() write as much as it can, and throw an exception if
> not all of the bytes can be written to the device?  (again, this would
> only come up when a user has set the underlying file descriptor to
> non-blocking mode)

I see little point in having an interface that randomly fails
depending on the stars, phase of the moon, etc.  If the application is
using the interface wrong then we should fail every time.

Errors should never pass silently.

> > I'm especially wary of infinite buffers.  They allow a malicious peer
> > to consume all your memory, DoSing the process or even the whole box
> > if Linux's OOM killer doesn't kick in fast enough.
>
> For a write-buffer, you start eating up memory only if an application
> is buffer-ignorant and tries to dump a massive amount of data to the
> socket all at once.  A naive HTTP server implementation might do this
> by calling something like s.write(open(filename)).  This isn't a DoS
> by a peer though, it's a local implementation problem.

Any application expecting a blocking file and getting a non-blocking
one is buffer-ignorant.  How is this odd way of failing useful?

> For a read-buffer, you start eating up all of memory only if you call
> .read() with no arguments and the peer dumps a few gig on you.  If you
> call read(n) to get as much data as you need, then the buffer class
> will only grab reasonably sized chunks from the network.  Network
> applications don't normally call .read() without arguments, since they
> need to communicate both ways.  If the object is an ordinary file,
> then DoS isn't so much of an issue and reading the whole files seems
> very reasonable.
>
> I suppose for an Text object wrapped around a socket, .readline()
> could be dangerous if a malicious peer sends a few gig all on one
> line.  That's a problem for the encoding layer to sort out, not the
> buffering layer though.

A networked application should never read an unlimited amount of a
socket, it should always used fixed-size blocks or fixed-size lines.

The rare application that requires processing all of the contents at
once should first write it to disk (which has a much larger capacity),
then read back in only limited amount at a time.

I can see three different behaviours when reading from a file or socket:
* Blocking, read all.  Returns exactly the amount specified, raising
an exception is there is a short read.  If n is None (the default) it
reads the full contents of the file.
* Blocking, chunked.  Returns n bytes unless the end is hit, in which
case it returns a short read.  If n is None it uses a default size.
* Non-blocking, chunked.  Returns however many bytes it feels like, up
to a maximum of n.  If n is None it uses a default size.  Empty reads
indicate an error.

Both blocking modes are very similar, differing only in their default
(read all vs read whatever) and their handling of the end of the file.
 I'm not convinced combining them (as python has traditionally done)
is optimal, but it's not a big deal.

The non-blocking, chunked mode is very different however.  It can
return a short read at any point.  Applications expecting blocking
mode may get empty strings (or exceptions indicating such, perhaps the
only saving grace.)

Using .nonblockingread() is long, but I expect it to be wrapped by the
event loop anyway.

-- 
Adam Olsen, aka Rhamphoryncus