[Python-3000] suggestion for a new socket io stack

Sat Apr 29 00:36:49 CEST 2006

Good ideas. Sockets follow the C API to the letter; this was a good
idea when I first created the socket extension module, but nowadays it
doesn't feel right. Also, the socket.py module contains a lot of code
(a whole buffering file implementation!) that ought to be subsumed
into the new I/O stack.

The resolver APIs (gethostname() etc.) also need to be rethought.

And then there's the select module, which has a very close
relationship with sockets (e.g. on Windows, only sockets are
acceptable to select) -- and the new BSD kqueue API which ought to be
provided when available. (Preferably there should be a way to wait for
multiple sockets without having to decide whether to use select, poll
or kqueue.)

And then there's the exception model. It's a pain that we have
separate OSError, IOError and socket.error classes, and the latter
doesn't even inherit from EnvironmentError. I propose that these all
be unified (and the new class called IOError).

Perhaps a few people could get together and design a truly minimal
socket API along the lines suggested here? The goal ought to be to
support the same set of use cases as supported by the current socket
and select modules but using more appropriate APIs.

Here's how it could interface to the new I/O stack: at the very bottom
of the new I/O stack will be an object supporting unbuffered read and
write operations. o.read(n) returns a bytes object of length at most
n. o.write(b) takes a bytes object and writes some bytes, returning
the number of bytes actually written. (There are also
seek/tell/truncate ops but these aren't relevant here.) On top of this
we can build buffering, encoding, and line ending translation
(covering all the functionality we have in 2.x) but we could also
build alternative functionality on top of buffered byte streams, for
example record-based I/O.

The bottom of the stack maps directly onto Unix read/write system
calls but also onto recv/send socket system calls (and on Windows, the
filedescriptor spaces are different so a filedescriptor-based common
cass can't work there). I wouldn't bother mapping datagram sockets to
the I/O stack; datagrams ough to be handled on a packet basis.

--Guido

On 4/28/06, tomer filiba <tomerfiliba at gmail.com> wrote:
> i've seen discussion about improving the IO stack of python, to rely less on
> the low-level lib-c implementation. so, i wanted to share my ideas in
> that niche.
>
> i feel today's sockets are way *outdated* and overloaded. python's sockets are
> basically a wrapper for the low-level BSD sockets... but IMHO it would be much
> nicer to alleviate this dependency: expose a more high-level interface to socket
> programming. the *BSD-socket methodology* does not sit well with pythonic
> paradigms.
>
> let's start with {set/get}sockopt... that's one of the ugliest things
> in python, i
> believe most would agree. it's basically C programming in python. so, indeed,
> it's a way to overcome differences between platforms and protocols, but i
> believe it's not the way python should handle it.
>
> my suggestion is nothing "revolutionary". it's basically taking the existing
> socket module and extending it for most common use cases.
>
> there are two types of sockets, streaming and datagram. the underlying
> protocols don't matter. and these two types of sockets have different semantics
> to them: send/recv vs. sendto/recvfrom. so why not introduce a StreamSocket
> and DgramSocket types? and of course RawSocket should be introduced
> to completement them.
>
> you can argue that recvfrom and sendto can be used on streaming sockets
> as well, but did anyone ever use it? i never saw such code, and i can't think
> why you would want to use it.
>
> next, all the socket options would become properties or methods (i prefer
> properties). each protocol would subclass {Stream/Dgram/Raw}Socket
> and add its protocol-specific options.
>
> here's an example for a hierarchy:
> Socket
>     RawSocket
>     DgramSocket
>         UDPSocket
>     StreamSocket
>         TCPSocket
>             SSLSocket
>
> the above tree is only partial of course. but it needn't be complete,
> either. less
> used protocols, like X25 or ICMP could be constructed directly with the Socket
> class, in the old fashion of passing parameters. after all, the suggested class
> hierarchy only wraps the existing socket constructor and adds a more python
> API to its options.
>
> here's an example:
> s = TCPSocket(AF_INET6)
> s.reuse_address = True # this option is inherited from Socket
> s.no_delay = True # this is a TCP-level option
> s.bind(("", 12345))
> s.listen(1)
> s2 = s.accept()
> s2.send("hello")
>
> or
> s = UDPSocket()
> s.allow_broadcast = True
> s.sendto("hello everybody", ("255.255.255.255", 12345))
>
> perhaps we should consider adding an "options" namespace, in order to
> keep the root level of the instance simpler. for example:
> s.options.reuse_address = True
>
> it clarifies that reuse_address is an option. is it necessary? donno.
>
> and since we can override bind(), perhaps we should override it to provide
> a more specific interface, i.e.
> def bind(self, addr, port):
>     super(self, ...).bind((addr, port))
>
> because we *know* it's a tcp socket, so we don't need to *retain support* for
> all addressing forms: it's an IP address and a port.
>
> ---
>
> i would also want to replace the current BSD semantics for *client sockets*,
> of first creating a socket and then connecting it, i.e.,
> s = socket()
> s.connect(("localhost", 80))
>
> i would prefer
> s = ConnectedSocket(("localhost", 80))
>
> because a *connecting the socket* is part of *initiallizing* it, hence
> it should
> be part of the class' constructor, and not a separate phase of the socket's
> life.
>
> perhaps the syntax should be
> s = TCPSocket.connect(("localhost", 80))
> # or s = TCPSocket.connect("localhost", 80)
> # if we override connect()
>
> where <socketclass>.connect would be a classmethod, which returns a
> new instance of the class, connected to the server. of course DgramSockets
> don't need such a mechanism.
>
> i would like to suggest the same about connection-oriented server sockets,
> but the case with those is a little more complicated, and possibly
> asynchronous (select()ing before accept()ing), so i would retain the existing
> semantics.
>
> ---
>
> another thing i find quite silly is the way sockets behave on shutdown and
> in non-blocking mode.
>
> when the connection breaks, i would expect recv() to raise EOFError, or
> some sort of socket.error, instead of returning "". moreover, when i'm using
> a non-blocking recv(), and there's no data to return, i would expect "", not a
> socket.timeout exception.
>
> to sum it up:
> * no data = ""
> * connection breaks = EOFError
>
> the situation, however, is *exactly the opposite*. which is quite not intuitive
> or logical, and i remember having to write this code:
>
> def recv(s):
>     try:
>         data = s.recv(1000)
>         if not data: # socket closed
>              raise EOFError
>     except socket.timeout:
>         data = "" # timeout
>     return data
>
> to accumulate data from non-blocking sockets, in a friendly way.
>
> so yeah, the libsocket version of recv returns 0 on EOF and -1 with some
> errno when there's no data, but the pythonic version shouldn't just *copy*
> this behavior -- it should *translate* it to pythonic standards.
>
> you have to remember that libsocket and the rest where written in the 80's,
> and are very platform-dependent. plus, C doesn't allow multiple return values
> or exceptions, so they had to do it this way.
>
> the question that should guide you is, "if you where to write pythonic sockets,
> how would they look?" rather than "how do i write a 1:1 wrapper for libsocket?"
>
> ---
>
> by the way, a little cleanup:
>
> * why does accept return a tuple? instead of
> newsock, sockname = sock.accept()
>
> why not do
> newsock = sock.accept()
> sockname = newsock.getsockname()
>
> i'm always having strange bugs because i forget accept gives me a tuple rather
> than just a socket... and you don't generally need the sockname, especially
> since you can get it later with getsockname.
>
> * the host-to-network functions, are they needed? can't you just use struct.pack
> and unpack? why not throw them away?
>
> what do you say?
>
>
> -tomer
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>

--
--Guido van Rossum (home page: http://www.python.org/~guido/)