[Python-3000] new io (pep 3116)

Mon May 7 13:08:04 CEST 2007

my original idea about the new i/o foundation was more elaborate
than the pep, but i have to admit the pep is more feasible and
compact. some comments though:

writeline
-----------------------------
TextIOBase should grow a writeline() method, to be symmetrical
with readline(). the reason is simple -- the newline char is
configurable in the constructor, so it's not necessarily "\n".
so instead of adding the configurable newline char manually,
the user should call writeline() which would append the
appropriate newline automatically.

sockets
-----------------------------
iirc, SocketIO is a layer that wraps an underlying socket object.
that's a good distinction -- to separate the underlying socket from
the RawIO interface -- but don't forget socket objects,
by themselves, need a cleanup too.

for instance, there's no point in UDP sockets having listen(), or send()
or getpeername() -- with UDP you only ever use sendto and recvfrom.
on the other hand, TCP sockets make no use of sendto(). and even with
TCP sockets, listeners never use send() or recv(), while connected
sockets never use listen() or connect().

moreover, the current socket interface simply mimics the BSD
interface. setsockopt, getsockopt, et al, are very unpythonic by nature --
the ought to be exposed as properties or methods of the socket.
all in all, the current socket model is very low level with no high
level design.

some time ago i was working on a sketch for a new socket module
(called sock2) which had a clear distinction between connected sockets,
listener sockets and datagram sockets. each protocol was implemented
as a subclass of one of these base classes, and exposed only the
relevant methods. socket options were added as properties and
methods, and a new DNS module was added for dns-related queries.

you can see it here -- http://sebulba.wikispaces.com/project+sock2
i know it's late already, but i can write a PEP over the weekend,
or if someone else wants to carry on with the idea, that's fine
with me.

non-blocking IO
-----------------------------
the pep says "In order to put an object in object in non-blocking
mode, the user must extract the fileno and do it by hand."
but i think it would only lead to trouble. as the entire IO library
is being rethought from the grounds up, non-blocking IO
should be taken into account.

non-blocking IO depends greatly on the platform -- and this is
exactly why a cross-platform language should standardized that
as part of the new IO layer. saying "let's keep it for later" would only
require more work at some later stage.

it's true that SyncIO and AsyncIO don't mingle well with the same
interfaces. that's why i think they should be two distinct classes.
the class hierarchy should be something like:

class RawIO:
    def fileno()
    def close()

class SyncIO(RawIO):
    def read(count)
    def write(data)

class AsyncIO(RawIO):
    def read(count, timeout)
    def write(data, timeout)
    def bgread(count, callback)
    def bgwrite(data, callback)

or something similar. there's no point to add both sync and async
operations to the RawIO level -- it just won't work together.
we need to keep the two distinct.

buffering should only support SyncIO -- i also don't see much point
in having buffered async IO. it's mostly used for sockets and devices,
which are most likely to work with binary data structures rather than
text, and if you *require* non-blocking mode, buffering will only
get in your way.

if you really want a buffered AsyncIO stream, you could write a
compatibility layer that makes the underlying AsyncIO object
appear synchronous.

records
-----------------------------
another addition to the PEP that seems useful to me would be a
RecordIOBase/Wrapper. records are fixed-length binary data
structures, defined as format strings of the struct-module.

class RecordIOWrapper:
    def __init__(self, buffer, format)
    def read(self) -> tuple of fields
    def write(self, *fields)

another cool feature i can think of is "multiplexing",  or working
with the same underlying stream in different ways by having multiple
wrappers over it.

for example, to implement a type-length-value stream, which is very
common in communication protocols, one could do something like

class MultiplexedIO:
    def __init__(self, *streams):
        self.streams = itertools.cycle(streams)
    def read(self, *args):
        """read from the next stream each time it's called"""
        return self.streams.next().read(*args)

sock = BufferedRW(SocketIO(...))
tlrec = Record(sock, "!BL")
tlv = MultiplexedIO(tvrec, sock)

type, length = tlv.read()
value = tlv.read(length)

you can also build higher-level state machines with that -- for instance,
if the type was "int", the next call to read() would decode the value as
an integer, and so on. you could write parsers right on top of the IO
layer.

just an idea. i'm not sure if that's proper design or just a silly idea,
but we'll leave that to the programmer.

-tomer