[Python-3000] new io (pep 3116)

Guido van Rossum guido at python.org
Mon May 7 21:14:30 CEST 2007


On 5/7/07, tomer filiba <tomerfiliba at gmail.com> wrote:
> my original idea about the new i/o foundation was more elaborate
> than the pep, but i have to admit the pep is more feasible and
> compact. some comments though:
>
> writeline
> -----------------------------
> TextIOBase should grow a writeline() method, to be symmetrical
> with readline(). the reason is simple -- the newline char is
> configurable in the constructor, so it's not necessarily "\n".
> so instead of adding the configurable newline char manually,
> the user should call writeline() which would append the
> appropriate newline automatically.

That's not symmetric. readline() returns a string that includes a
trailing \n even if the actual file contained \r or \r\n. write()
already is supposed to translate \n anywhere (not just at the end of
the line) into the specified or platform-default (os.sep) separator. A
method writeline() that *appended* a separator would be totally new to
the I/O library. Even writelines() doesn't do that.

> sockets
> -----------------------------
> iirc, SocketIO is a layer that wraps an underlying socket object.
> that's a good distinction -- to separate the underlying socket from
> the RawIO interface -- but don't forget socket objects,
> by themselves, need a cleanup too.

But that's out of the scope of the PEP. The main change I intend to
make is to return bytes instead of strings.

> for instance, there's no point in UDP sockets having listen(), or send()
> or getpeername() -- with UDP you only ever use sendto and recvfrom.
> on the other hand, TCP sockets make no use of sendto(). and even with
> TCP sockets, listeners never use send() or recv(), while connected
> sockets never use listen() or connect().
>
> moreover, the current socket interface simply mimics the BSD
> interface. setsockopt, getsockopt, et al, are very unpythonic by nature --
> the ought to be exposed as properties or methods of the socket.
> all in all, the current socket model is very low level with no high
> level design.

That's all out of scope for the PEP. Also I happen to think that
there's nothing particularly wrong with sockets -- they generally get
wrapped in higher layers like httplib.

> some time ago i was working on a sketch for a new socket module
> (called sock2) which had a clear distinction between connected sockets,
> listener sockets and datagram sockets. each protocol was implemented
> as a subclass of one of these base classes, and exposed only the
> relevant methods. socket options were added as properties and
> methods, and a new DNS module was added for dns-related queries.
>
> you can see it here -- http://sebulba.wikispaces.com/project+sock2
> i know it's late already, but i can write a PEP over the weekend,
> or if someone else wants to carry on with the idea, that's fine
> with me.

Sorry, too late. We're putting serious pressue already on authors who
posted draft PEPs before the deadline but haven't submitted their text
to Subversion yet. At this point we have a definite list of PEPs that
were either checked in or promised on time for the deadline. New
proposals will have to wait until after 3.0a1 is released (hopefully
end of June). Also note that the whole stdlib reorg is planned to
happen after that release.

> non-blocking IO
> -----------------------------
> the pep says "In order to put an object in object in non-blocking
> mode, the user must extract the fileno and do it by hand."
> but i think it would only lead to trouble. as the entire IO library
> is being rethought from the grounds up, non-blocking IO
> should be taken into account.

Why? Non-blocking I/O makes most of the proposed API useless.
Non-blocking I/O is highly specialized and hard to code against. I'm
all for a standard non-blocking I/O library but this one isn't it.

> non-blocking IO depends greatly on the platform -- and this is
> exactly why a cross-platform language should standardized that
> as part of the new IO layer. saying "let's keep it for later" would only
> require more work at some later stage.

Actually there are only two things platform-specific: how to turn it
on (or off) and how to tell the difference between "this operation
would block" and "there was an error".

> it's true that SyncIO and AsyncIO don't mingle well with the same
> interfaces. that's why i think they should be two distinct classes.
> the class hierarchy should be something like:
>
> class RawIO:
>     def fileno()
>     def close()
>
> class SyncIO(RawIO):
>     def read(count)
>     def write(data)
>
> class AsyncIO(RawIO):
>     def read(count, timeout)
>     def write(data, timeout)
>     def bgread(count, callback)
>     def bgwrite(data, callback)
>
> or something similar. there's no point to add both sync and async
> operations to the RawIO level -- it just won't work together.
> we need to keep the two distinct.

I'd rather cut out all support for async I/O from this library and
leave it for someone else to invent. I don't need it. People who use
async I/O on sockets to implement e.g. fast web servers are unlikely
to use io.py; they have their own API on top of raw sockets + select
or poll.

> buffering should only support SyncIO -- i also don't see much point
> in having buffered async IO. it's mostly used for sockets and devices,
> which are most likely to work with binary data structures rather than
> text, and if you *require* non-blocking mode, buffering will only
> get in your way.
>
> if you really want a buffered AsyncIO stream, you could write a
> compatibility layer that makes the underlying AsyncIO object
> appear synchronous.

I agree with cutting async I/O from the buffered API, *except* for
specifying that when the equivalent of EWOULDBLOCK happens at the
lower level the buffering layer should notr retry but raise an error.
I think it's okay if the raw layer has minimal support for async I/O.

> records
> -----------------------------
> another addition to the PEP that seems useful to me would be a
> RecordIOBase/Wrapper. records are fixed-length binary data
> structures, defined as format strings of the struct-module.
>
> class RecordIOWrapper:
>     def __init__(self, buffer, format)
>     def read(self) -> tuple of fields
>     def write(self, *fields)

The struct module has the means to build that out of lower-level reads
and writes already. If you think a library module to support this is
needed, write one and make it available as a third party module and
see how many customers you get. Personally I haven't had the need for
files containing of fixed-length records of the same type since the
mid '80s.

> another cool feature i can think of is "multiplexing",  or working
> with the same underlying stream in different ways by having multiple
> wrappers over it.

That's why we make the underlying 'raw' object available as an
attribute. So you can experiment with this.

> for example, to implement a type-length-value stream, which is very
> common in communication protocols, one could do something like
>
> class MultiplexedIO:
>     def __init__(self, *streams):
>         self.streams = itertools.cycle(streams)
>     def read(self, *args):
>         """read from the next stream each time it's called"""
>         return self.streams.next().read(*args)
>
> sock = BufferedRW(SocketIO(...))
> tlrec = Record(sock, "!BL")
> tlv = MultiplexedIO(tvrec, sock)
>
> type, length = tlv.read()
> value = tlv.read(length)
>
> you can also build higher-level state machines with that -- for instance,
> if the type was "int", the next call to read() would decode the value as
> an integer, and so on. you could write parsers right on top of the IO
> layer.
>
> just an idea. i'm not sure if that's proper design or just a silly idea,
> but we'll leave that to the programmer.

I don't think the new I/O library is the place to put in a bunch of
new, essentially untried ideas. Instead, we should aim for a flexible
implementation of APIs that we know work and are needed. I think the
current stack is pretty flexible in that it supports streams and
random access, unidirectional and bidirectional, raw and buffered,
bytes and text. Applications can do a lot with those.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list