[Python-Dev] _socket efficiencies ideas
Guido van Rossum
guido@python.org
Tue, 08 Apr 2003 10:50:50 -0400
> I have been in discussion recently with Martin v. Loewis about an idea
> I have been thinking about for a while to improve the efficiency of the
> connect method in the _socket module. I posted the original suggestion
> to the python suggestions tracker on sourceforge as item 706392.
>
> A bit of history and justification:
> I am doing a lot of work using python to develop almost-real-time
> distributed data acquisition and control systems from running
> laboratory apparatus. In this environment, I do a lot of sun-rpc calls
> as part of the vxi-11 protocol to allow TCP/IP access to gpib-like
> devices. As a part of this, I do a lot sock socket.connect() calls,
> often with the connections being quite transient. The problem is that
> the current python _socket module makes a DNS call to try to resolve
> each address before connect is called, which if I am
> connecting/disconnecting many times a second results in pathological
> and gratuitous network activity. Incidentally, I am in the process of
> creating a sourceforge project, pythonlabtools (just approved this
> morning), in which I will start maintaining a repository of the tools I
> have been working on.
Are you sure that it tries make a DNS call even when the address is
pure numeric? That seems a mistake, and if that's really happening, I
think that is the part that should be fixed. Maybe in the _socket
module, maybe in getaddrinfo().
> My first solution to this, for which I submitted a patch to the tracker
> system (with guidance from Martin), was to create a wrapper for the
> sockaddr object, which one can create in advance, and when
> _socket.connect() is called (actually when getsockaddrarg() is called
> by connect), results in an immediate connection without any DNS
> activity.
>
> This solution solves part of the problem, but may not be the right
> final one. After writing this patch and verifying its functionality, I
> tried it in the real world. Then, I realized that for sun-rpc work, it
> wasn't quite what I needed, since the socket number may be changing
> each time the rpc request is made, resulting in a new address wrapper
> being needed, and thus DNS activity again.
>
> After thinking about what I have done with this patch, I would also
> like to suggest another change (for which I am also willing to submit
> the patch, which is quite simple): Consistent with some of the already
> extant glue in _socket to handle addresses like <broadcast>, would
> there be any reason no to modify
> setipaddr() and getaddrinfo() so that if an address is prefixed with
> <numeric> (e.g. <numeric>127.0.0.1) that the PASSIVE and NUMERIC flags
> are always set so these routines reject any non-numeric address, but
> handle numeric ones very efficiently?
>
> I have already implemented a predecessor to this which I am
> experimentally running at home in python 2.2.2, in which I made it so
> that prefixing the address with an exclamation point provided this
> functionality. Given the somewhat more legible approach the team has
> already chosen for special addresses, I see no reason why using a
> <numeric> (or some such) prefix isn't reasonable.
>
> Do any members of the development team have commentary on this? Would
> such a change be likely to be accepted into the system? Any reasons
> which it might break something? The actual patch would be only about
> 10 lines of code, (plus some documentation), a few in each of the
> routines mentioned above.
I don't see why we would have to add the <numeric> flag to the address
when the form of the address itself is already a perfect clue that the
address is purely numeric. I'd be happy to see a patch that
intercepts addresses of the form \d+\.\d+\.\d+\.\d+ and parses those
without calling getaddrinfo().
--Guido van Rossum (home page: http://www.python.org/~guido/)