[Python-Dev] _socket efficiencies ideas
Marcus Mendenhall
marcus.h.mendenhall@vanderbilt.edu
Tue, 8 Apr 2003 09:38:57 -0500
I have been in discussion recently with Martin v. Loewis about an idea
I have been thinking about for a while to improve the efficiency of the
connect method in the _socket module. I posted the original suggestion
to the python suggestions tracker on sourceforge as item 706392.
A bit of history and justification:
I am doing a lot of work using python to develop almost-real-time
distributed data acquisition and control systems from running
laboratory apparatus. In this environment, I do a lot of sun-rpc calls
as part of the vxi-11 protocol to allow TCP/IP access to gpib-like
devices. As a part of this, I do a lot sock socket.connect() calls,
often with the connections being quite transient. The problem is that
the current python _socket module makes a DNS call to try to resolve
each address before connect is called, which if I am
connecting/disconnecting many times a second results in pathological
and gratuitous network activity. Incidentally, I am in the process of
creating a sourceforge project, pythonlabtools (just approved this
morning), in which I will start maintaining a repository of the tools I
have been working on.
My first solution to this, for which I submitted a patch to the tracker
system (with guidance from Martin), was to create a wrapper for the
sockaddr object, which one can create in advance, and when
_socket.connect() is called (actually when getsockaddrarg() is called
by connect), results in an immediate connection without any DNS
activity.
This solution solves part of the problem, but may not be the right
final one. After writing this patch and verifying its functionality, I
tried it in the real world. Then, I realized that for sun-rpc work, it
wasn't quite what I needed, since the socket number may be changing
each time the rpc request is made, resulting in a new address wrapper
being needed, and thus DNS activity again.
After thinking about what I have done with this patch, I would also
like to suggest another change (for which I am also willing to submit
the patch, which is quite simple): Consistent with some of the already
extant glue in _socket to handle addresses like <broadcast>, would
there be any reason no to modify
setipaddr() and getaddrinfo() so that if an address is prefixed with
<numeric> (e.g. <numeric>127.0.0.1) that the PASSIVE and NUMERIC flags
are always set so these routines reject any non-numeric address, but
handle numeric ones very efficiently?
I have already implemented a predecessor to this which I am
experimentally running at home in python 2.2.2, in which I made it so
that prefixing the address with an exclamation point provided this
functionality. Given the somewhat more legible approach the team has
already chosen for special addresses, I see no reason why using a
<numeric> (or some such) prefix isn't reasonable.
Do any members of the development team have commentary on this? Would
such a change be likely to be accepted into the system? Any reasons
which it might break something? The actual patch would be only about
10 lines of code, (plus some documentation), a few in each of the
routines mentioned above.
Thanks for any suggestions.
Marcus Mendenhall