[Python-Dev] _socket efficiencies ideas

Marcus Mendenhall marcus.h.mendenhall@vanderbilt.edu
Tue, 8 Apr 2003 09:38:57 -0500


I have been in discussion recently with Martin v. Loewis about an idea 
I have been thinking about for a while to improve the efficiency of the 
connect method in the _socket module.  I posted the original suggestion 
to the python suggestions tracker on sourceforge as item 706392.

A bit of history and justification:
I am doing a lot of work using python to develop almost-real-time 
distributed data acquisition and control systems from running 
laboratory apparatus.  In this environment, I do a lot of sun-rpc calls 
as part of the vxi-11 protocol to allow TCP/IP access to gpib-like 
devices.  As a part of this, I do a lot sock socket.connect() calls, 
often with the connections being quite transient. The problem is that 
the current python _socket module makes a DNS call to try to resolve 
each address before connect is called, which if I am 
connecting/disconnecting many times a second results in pathological 
and gratuitous network activity.  Incidentally, I am in the process of 
creating a sourceforge project, pythonlabtools (just approved this 
morning), in which I will start maintaining a repository of the tools I 
have been working on.

My first solution to this, for which I submitted a patch to the tracker 
system (with guidance from Martin), was to create a wrapper for the 
sockaddr object, which one can create in advance, and when 
_socket.connect() is called (actually when getsockaddrarg() is called 
by connect), results in an immediate connection without any DNS 
activity.

This solution solves part of the problem, but may not be the right 
final one.  After writing this patch and verifying its functionality, I 
tried it in the real world.  Then, I realized that for sun-rpc work, it 
wasn't quite what I needed, since the socket number may be changing 
each time the rpc request is made, resulting in a new address wrapper 
being needed, and thus DNS activity again.

After thinking about what I have done with this patch, I would also 
like to suggest another change (for which I am also willing to submit 
the patch, which is quite simple):  Consistent with some of the already 
extant glue in _socket to handle addresses like <broadcast>, would 
there be any reason no to modify
setipaddr() and getaddrinfo() so that if an address is prefixed with 
<numeric> (e.g. <numeric>127.0.0.1) that the PASSIVE and NUMERIC flags 
are always set so these routines reject any non-numeric address, but 
handle numeric ones very efficiently?

I have already implemented a predecessor to this which I am 
experimentally running at home in python 2.2.2, in which I made it so 
that prefixing the address with an exclamation point provided this 
functionality.  Given the somewhat more legible approach the team has 
already chosen for special addresses, I see no reason why using a 
<numeric> (or some such) prefix isn't reasonable.

Do any members of the development team have commentary on this?  Would 
such a change be likely to be accepted into the system?  Any reasons 
which it might break something?  The actual patch would be  only about 
10 lines of code, (plus some documentation), a few in each of the 
routines mentioned above.

Thanks for any suggestions.

Marcus Mendenhall