[Python-3000] Draft PEP for New IO system

Tue Feb 27 17:38:23 CET 2007

On 2/27/07, Paul Moore <p.f.moore at gmail.com> wrote:
[...]
> Documenting the revised open() factory in this PEP would be useful. It
> needs to address encoding issues, so it's not a simple copy of the
> existing open().

Check the doc again. I added on at the end. It could use some review.
I also added an elaboration into the p3yk branch in svn; that could
use some review as well.

> Also, should there be a factory method for opening raw byte streams?

The open() I added returns a raw byte stream when you specify binary
mode with buffering=0.

> Once we start down this route, we open the can of worms, of course
> (does socket.socket need to be specified in terms of the new IO
> layers?

No, but check the io.py in svn; it has a SocketIO class that wraps a
socket. Sockets themselves are much lower level than this; they have
all sort of other APIs. The SocketIO class only works for stream
socket (e.g., TCP/IO).

> what about the mmap module, the gzip/zipfile/tarfile modules,
> etc?) These sould probably be noted in an "open issues" section, and
> otherwise deferred for now.

Agreed that we should add these to the open issues section. I don't
think we should mess with mmap, but *perhaps* a mmap wrapper could be
provided (by the mmap module). gzip, bzip2 etc. should probably be
redefined in terms of the buffered (bytes) reader/writer protocol.
zipfile and tarfile should take bytes readers/writers; the API they
*provide* should be defined in terms of bytes and perhaps (when
appropriate, I don't recall if they have read/write methods) in terms
of buffered byte streams.

It *may* even be useful if many of these would support non-blocking
I/O; we're currently considering adding a standard API for returning
"EWOULDBLOCK" errors (e.g. return None from read() and write()) --
though we won't be providing an API to turn that on (since it depends
on the underlying implementation, e.g. sockets vs. files).

> > The BufferedReader implementation is for sequential-access read-only
> > objects.  It does not provide a .flush() method, since there is no
> > sensible circumstance where the user would want to discard the read
> > buffer.
>
> It's not something I've done personally, but programs sometimes flush
> a read buffer before (eg) reading a password from stdin, to avoid
> typeahead problems. I don't know if that would be relevant here.

We discussed this briefly at the sprint and came to the conclusion
that this is outside the scope of the PEP; you can do this by
(somehow) enabling non-blocking mode and then reading until you get
None.

> > Another way to do it is as follows (we should pick one or the other):
> >
> >     .__init__(self, buffer, encoding=None, newline=None)
> >
> >        Same as above but if newline is not None use that as the
> > newline pattern (for reading and writing), and if newline is not set
> > attempt to find the newline pattern from the file and if we can't for
> > some reason use the system default newline pattern.
>
> I'm not sure that can work - the point of universal newlines is that
> *any* of \n, \r or \r\n count as a newline, so there's no one pattern.
> So I think that explicitly specifying universal newlines is necessary
> (even though it's clunky).

I think for input we should always accept all three line endings so
you never need to specify anything; for output, we should pick a
platform default (\r\n on Windows, \n everywhere else) and have an API
to override it. So the API you quote above sounds about right:

  .__init__(self, buffer, encoding=None, newline=None)

I'd like to constrain newline to be either \n or \r\n for writing; for
reading IMO it should not be specified.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)