[Python-Dev] Synchronous and Asynchronous servers in the standard
exarkun at divmod.com
exarkun at divmod.com
Mon Nov 8 16:34:34 CET 2004
On Mon, 08 Nov 2004 06:50:25 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <martin at v.loewis.de> wrote:
>Andrew Bennetts wrote:
> > One such list is here:
> > http://mail.zope.org/pipermail/zope3-dev/2002-October/003235.html
> I fail to see any of these as problematic:
Before continuing this discussion, it might be useful to define the goals for whatever ultimately ends up in the standard library. I believe there are a lot of differing implicit assumptions held by different posters on this topic which will make it very difficult to reach anything resembling concensus if not first resolved.
Here are a few questions to get started (I'm sure there are more to be considered):
What level of code re-use is desired? Should a protocol implementation be portable between different frameworks (Remind anyone of PEP 333)? Should new classes be required for different transports? eg, a class for SMTP/TCP, a class for SMTP/OpenSSL-TLS, a class for SMTP/TLSLite-TLS, a class For SMTP/TLSLite-SSL.
What level of performance is desirable? Is the aim something comparable to the threading and forking based servers already in the stdlib? Is it "sky's the limit"? Should it be platform independent?
Who is the target audience? Will this be something beginning network programmers should be able to pick up and use reasonably easy? Is that more important than allowing more complex software to be expressed easily? (Most of the rest of these questions are basically subsets of this one)
> 1. invokes readable/writable in each round, thereby not preserving
> state; presumably ok for select and poll, but bad (wasting
> performance for kqueue).
> I can't see this as a problem: asyncore does use select/poll,
> not kqueue. In doing so, it first processes all ready file
> descriptors before going to the next round, so in the next
> round, it needs to check all dispatchers again.
> There seems to be an implicit assertion that anything that uses
> select/poll must be evil, and the only true API is kqueue.
> I can't claim to understand the rationale for introducing
> kqueue in the first place, but if it is to improve performance,
> then I expect that any performance gained in kqueue over
> select goes away by using Python (e.g. adding an indirection
> in dispatching is probably more expensive than what kqueue
> would have saved).
KQueue's performance benefits over select() are absolutely noticable from Python. We're talking orders of magnitude stuff here. select() is fine for small servers, and poll() is even a little better, but unless you want to impose harsh, arbitrary limits on the scalability of servers (and certain kinds of clients, eg aggregators) developed in Python, these shouldn't be the only two options. Even if KQueue isn't supported directly by the standard library, the mechanism should be amenable to support of it by third parties - this means an easily overridden notification call as well as supporting assumptions from the rest of the library. Calling a function once per connection per iteration should be avoided if at all possible (and it is possible).
> 2. ties together protocol and transport. So what? I don't
> want to use SMTP over UDP, or X.25. TCP, possibly together
> with TLS, is just fine.
Keep in mind there is no support for SSL servers in the standard library (this is still true, right? I admit I haven't been paying much attention to this lately). So right off the bat, you _can't_ tie the transport to the protocol if you want to write an SSL server. It needs to be trivially parameterizable by a third party. Even ignoring that, there are several choices for SSL implementations in Python. If it can be avoided, I don't see any reason to require a separate class for each of them for each protocol.
> 3. tied to sockets. Again: so what? I can't follow the assertion
> that you cannot use pyOpenSSL because of that, but then,
> I haven't used pyOpenSSL. The only possible interpretation
> is that pyOpenSSL does not expose "raw" (pollable) socket
> objects, which would really sound like a limitation in
> pyOpenSSL, not in asyncore.
The point isn't "tied to sockets" but "tied to the socket module". For a slightly less obvious example of why this is a drawback, consider that PyOpenSSL connections raise SSL.Error, not socket.error. So if you want to use PyOpenSSL, you don't just have to find and replace all of the uses of socket.socket(), but all of the uses of socket.error as well. Even worse, the behavior of SSL sockets doesn't entirely match the behavior of non-SSL sockets - select() can tell you a socket is ready for reading, and then a read can return '' because the available bytes were consumed entirely by SSL protocol-level negotiation. asyncore will interpret this to mean the socket has been closed by the peer and call handle_close()! So you really need an almost entirely different dispatcher to deal with SSL, and now we're back to the point above of having lots of different classes, one for each transport.
> a. Cannot use NT I/O completion ports. Again, what's wrong
> with select? Performance? I'd really like to see a Python
> application where the speed gain from IOCP is significant
> compared to using select. The limit of 64 sockets is serious,
> but Python bumps it to 512.
Performance, yes. As Alex Martelli mentioned before me, IOCP also supports non-socket handles. stdin/stdout, for example, as well as handles to subprocesses. Again, since IOCP provides speedups comparable to KQueue when compared to select(), the speedups for large numbers of sockets is totally noticable, even from sluggish old Python ;) Of course, you'd never notice them if your server could only handle 512 connections, but maybe you'd appreciate not having that limit, too :)
> b. cannot do the the TCP transfer code in the C networking
> core. I don't really understand that point. What is the
> TCP transfer code?
This would involve rewriting all of the socket handling, buffer handling, and even dispatch as an extension module. Buffer handling is a real win here, since implementing it in Python involves a lot of wasteful string copying. Itamar (original poster on the zope list) is now employed writing a lot of code like this for Twisted. He has very high throughput requirements, and they'd be very difficult to achieve without the ability to replace this core component with his own code. Since Twisted _is_ architected in such a way that this code is all one unit, instead of spread out among dozens, hundreds, or thousands of instances, after rewriting just a few classes, essentially any Twisted application can benefit from his speedup.
Hope this clears some things up,
More information about the Python-Dev