[Python-Dev] Re: Python-Dev digest, Vol 1 #3221 - 4 msgs

Glyph Lefkowitz glyph@twistedmatrix.com
Mon, 28 Apr 2003 15:49:27 -0500


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Monday, April 28, 2003, at 11:00 AM, python-dev-request@python.org 
wrote:

>     Itamar> If this slowdown is confirmed, it is really not acceptable,
>     Itamar> since the change seems to have been made only to support 
> making
>     Itamar> timeout sockets slightly easier to use.
>
> It was done to support making timeout sockets work properly.  As they
> existed previously, timeout sockets wouldn't work with protocols which 
> would
> most likely use them: higher level modules such as httplib, which call
> sock.makefile(), then call readlines?() on the resulting file object.

Clearly this is a flaw in httplib's design.  Perhaps one should be able 
to pass in a socket or file factory?  That would allow speaking HTTP 
over non-TCP transports or through something like a SOCKS proxy, which 
is arguably a good thing.  Do you want to add SOCKS support by adding 
another wrapper around the socket module as well?  How about a python 
software firewall?  Pretty soon our "correct" socket module will have 
20 performance-destroying wrappers around it in order to work around 
deficiencies in the interfaces of some programs which use sockets.

httplib is importing a module where passing a factory function is the 
correct thing to do.  At first it looks like you can parameterize it by 
hacking up a module, but you can only do that once or twice before the 
design problem really becomes pressing.

The socket module is not a high-level interface to networking.  
Attempting to make it into one will harm its utility as a low-level 
interface that good high-level interfaces can be built on top of.

>     Itamar> Why should everyone have to pay a speed penalty just so a
>     Itamar> minority of people can skip calling a
>     Itamar> "socket.installtimeoutsupport()" at the beginning of their
>     Itamar> program? it's just one line of code they'd need to add.
>
> I think it would be easier for the minority of programs that care 
> about the
> 20% performance loss to simply set

I think this should be in the release notes for 2.3.  "Python is 10% 
faster, unless you use sockets, in which case it is much, much slower.  
Do the following in order to regain lost performance and retain the 
same semantics:"

I anticipate that more than just Twisted will want to monkey-patch the 
module.  (A 20% drop in throughput is a significant issue to more than 
an eclectic audience.)  If you're not going to fix this bug, maybe we 
could have a "socket.monkeypatch()" method which would prevent 
different systems from stepping on each other when they do it?

> I don't know about you, but fast and incorrect don't help me much.

Since when is the behavior of the socket module "incorrect"?  If 
anything the interface to "timeout sockets" is incorrect, because BSD 
sockets do not in fact support timeouts.  The interface is doing a 
bunch of things behind the user's back which would be better done 
another way, for example, with actually asynchronous networking.  It's 
pretty likely that there is some obscure corner-case that the select() 
in timeout sockets doesn't catch.

 From a brief glance, internal_select ignores error return values, and 
nothing checks its errno before making another socket call.  If I 
remember correctly, that means that if select gets an EINTR, the 
following call to accept() or recv() or what-have-you may very well 
block.  Of course, since the socket is in non-blocking mode at this 
point, that means that Python will raise an exception on the EAGAIN 
EWOULDBLOCK error.  This is pretty hard to write a test for.

I could be wrong about this particular error, but in general if one 
wishes to be pedantic about "correctness", one must first check the 
result codes from one's C system calls.

> Feel free to submit a patch which improves performance but maintains 
> proper behavior in the face of timeouts (that is, allows 
> test_urllibnet to still work correctly).

Why is the Python development team introducing bugs into Python and 
then expecting the user community to fix things that used to work?  I 
could understand not wanting to put a lot of effort into correcting 
obscure or difficult-to-find performance problems that only a few 
people care about, but the obvious thing to do in this case is simply 
to change the default behavior.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (Darwin)

iD8DBQE+rZPbvVGR4uSOE2wRAhZVAKCjWkl1NSr8bC1DGcbvhKwL4GZ9+ACeO2cJ
FNU17XosCZxRTVRF/wIkLys=
=GJ3H
-----END PGP SIGNATURE-----