[Python-3000] suggestion for a new socket io stack

Josiah Carlson jcarlson at uci.edu
Sat Apr 29 07:49:32 CEST 2006


Maybe it's just me, but I've previously implemented much of this with
asyncore subclasses.  While I didn't go all-out on the properties (I
never need to change the underlying options beyond setting it as
non-blocking), I did end up with a fairly standard __init__ signature
that solved many of my annoyances.

class ...:
    def __init__(self, host, port, sock=None, factory=None,
                 family=AF_INET, type=SOCK_STREAM, proto=0):

Listening sockets:
    have host, port, and factory, but no sock
New incoming sockets:
    have host, port, and sock, but no factory
Outgoing sockets:
    have host and port, but no sock or factory


It seems as though much of what Tomer suggests can be implemented with
subclassing of sockets.  When I first read his email, I started working
on a sample implementation of TCPSocket, but stopped when I started
using various metaprogramming tricks to ease the mapping of properties
to methods (I love using the Property metaclass).


While I think that sockets can be made easier to use in their creation
and setup (whether by subclass or by a factory function), both recv/send
could use a bit of a change, and the exception nonsense that Guido
mentioned should certainly be cleaned up; I'm not sure how much the
other changes will really ease socket programming, in general.  Why? 
Once I wrote my own send/recv handling in my socket (or
asyncore.dispatcher) subclass, I got sane-enough-for-me behavior, and
had no issues implementing various line or packet based protocols.

I certainly hope that people have been writing sane wrappers/subclasses,
but if not, then I guess that such changes would significantly ease
socket programming with Python (though even with the suggested changes,
still feels like BSD sockets to me).

 - Josiah

"tomer filiba" <tomerfiliba at gmail.com> wrote:
> i've seen discussion about improving the IO stack of python, to rely less on
> the low-level lib-c implementation. so, i wanted to share my ideas in
> that niche.
> 
> i feel today's sockets are way *outdated* and overloaded. python's sockets are
> basically a wrapper for the low-level BSD sockets... but IMHO it would be much
> nicer to alleviate this dependency: expose a more high-level interface to socket
> programming. the *BSD-socket methodology* does not sit well with pythonic
> paradigms.
> 
> let's start with {set/get}sockopt... that's one of the ugliest things
> in python, i
> believe most would agree. it's basically C programming in python. so, indeed,
> it's a way to overcome differences between platforms and protocols, but i
> believe it's not the way python should handle it.
> 
> my suggestion is nothing "revolutionary". it's basically taking the existing
> socket module and extending it for most common use cases.
> 
> there are two types of sockets, streaming and datagram. the underlying
> protocols don't matter. and these two types of sockets have different semantics
> to them: send/recv vs. sendto/recvfrom. so why not introduce a StreamSocket
> and DgramSocket types? and of course RawSocket should be introduced
> to completement them.
> 
> you can argue that recvfrom and sendto can be used on streaming sockets
> as well, but did anyone ever use it? i never saw such code, and i can't think
> why you would want to use it.
> 
> next, all the socket options would become properties or methods (i prefer
> properties). each protocol would subclass {Stream/Dgram/Raw}Socket
> and add its protocol-specific options.
> 
> here's an example for a hierarchy:
> Socket
>     RawSocket
>     DgramSocket
>         UDPSocket
>     StreamSocket
>         TCPSocket
>             SSLSocket
> 
> the above tree is only partial of course. but it needn't be complete,
> either. less
> used protocols, like X25 or ICMP could be constructed directly with the Socket
> class, in the old fashion of passing parameters. after all, the suggested class
> hierarchy only wraps the existing socket constructor and adds a more python
> API to its options.
> 
> here's an example:
> s = TCPSocket(AF_INET6)
> s.reuse_address = True # this option is inherited from Socket
> s.no_delay = True # this is a TCP-level option
> s.bind(("", 12345))
> s.listen(1)
> s2 = s.accept()
> s2.send("hello")
> 
> or
> s = UDPSocket()
> s.allow_broadcast = True
> s.sendto("hello everybody", ("255.255.255.255", 12345))
> 
> perhaps we should consider adding an "options" namespace, in order to
> keep the root level of the instance simpler. for example:
> s.options.reuse_address = True
> 
> it clarifies that reuse_address is an option. is it necessary? donno.
> 
> and since we can override bind(), perhaps we should override it to provide
> a more specific interface, i.e.
> def bind(self, addr, port):
>     super(self, ...).bind((addr, port))
> 
> because we *know* it's a tcp socket, so we don't need to *retain support* for
> all addressing forms: it's an IP address and a port.
> 
> ---
> 
> i would also want to replace the current BSD semantics for *client sockets*,
> of first creating a socket and then connecting it, i.e.,
> s = socket()
> s.connect(("localhost", 80))
> 
> i would prefer
> s = ConnectedSocket(("localhost", 80))
> 
> because a *connecting the socket* is part of *initiallizing* it, hence
> it should
> be part of the class' constructor, and not a separate phase of the socket's
> life.
> 
> perhaps the syntax should be
> s = TCPSocket.connect(("localhost", 80))
> # or s = TCPSocket.connect("localhost", 80)
> # if we override connect()
> 
> where <socketclass>.connect would be a classmethod, which returns a
> new instance of the class, connected to the server. of course DgramSockets
> don't need such a mechanism.
> 
> i would like to suggest the same about connection-oriented server sockets,
> but the case with those is a little more complicated, and possibly
> asynchronous (select()ing before accept()ing), so i would retain the existing
> semantics.
> 
> ---
> 
> another thing i find quite silly is the way sockets behave on shutdown and
> in non-blocking mode.
> 
> when the connection breaks, i would expect recv() to raise EOFError, or
> some sort of socket.error, instead of returning "". moreover, when i'm using
> a non-blocking recv(), and there's no data to return, i would expect "", not a
> socket.timeout exception.
> 
> to sum it up:
> * no data = ""
> * connection breaks = EOFError
> 
> the situation, however, is *exactly the opposite*. which is quite not intuitive
> or logical, and i remember having to write this code:
> 
> def recv(s):
>     try:
>         data = s.recv(1000)
>         if not data: # socket closed
>              raise EOFError
>     except socket.timeout:
>         data = "" # timeout
>     return data
> 
> to accumulate data from non-blocking sockets, in a friendly way.
> 
> so yeah, the libsocket version of recv returns 0 on EOF and -1 with some
> errno when there's no data, but the pythonic version shouldn't just *copy*
> this behavior -- it should *translate* it to pythonic standards.
> 
> you have to remember that libsocket and the rest where written in the 80's,
> and are very platform-dependent. plus, C doesn't allow multiple return values
> or exceptions, so they had to do it this way.
> 
> the question that should guide you is, "if you where to write pythonic sockets,
> how would they look?" rather than "how do i write a 1:1 wrapper for libsocket?"
> 
> ---
> 
> by the way, a little cleanup:
> 
> * why does accept return a tuple? instead of
> newsock, sockname = sock.accept()
> 
> why not do
> newsock = sock.accept()
> sockname = newsock.getsockname()
> 
> i'm always having strange bugs because i forget accept gives me a tuple rather
> than just a socket... and you don't generally need the sockname, especially
> since you can get it later with getsockname.
> 
> * the host-to-network functions, are they needed? can't you just use struct.pack
> and unpack? why not throw them away?
> 
> what do you say?
> 
> 
> -tomer
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/jcarlson%40uci.edu



More information about the Python-3000 mailing list