[Python-Dev] _socket efficiencies ideas

Guido van Rossum guido@python.org
Tue, 08 Apr 2003 10:50:50 -0400

> I have been in discussion recently with Martin v. Loewis about an idea 
> I have been thinking about for a while to improve the efficiency of the 
> connect method in the _socket module.  I posted the original suggestion 
> to the python suggestions tracker on sourceforge as item 706392.
> A bit of history and justification:
> I am doing a lot of work using python to develop almost-real-time 
> distributed data acquisition and control systems from running 
> laboratory apparatus.  In this environment, I do a lot of sun-rpc calls 
> as part of the vxi-11 protocol to allow TCP/IP access to gpib-like 
> devices.  As a part of this, I do a lot sock socket.connect() calls, 
> often with the connections being quite transient. The problem is that 
> the current python _socket module makes a DNS call to try to resolve 
> each address before connect is called, which if I am 
> connecting/disconnecting many times a second results in pathological 
> and gratuitous network activity.  Incidentally, I am in the process of 
> creating a sourceforge project, pythonlabtools (just approved this 
> morning), in which I will start maintaining a repository of the tools I 
> have been working on.

Are you sure that it tries make a DNS call even when the address is
pure numeric?  That seems a mistake, and if that's really happening, I
think that is the part that should be fixed.  Maybe in the _socket
module, maybe in getaddrinfo().

> My first solution to this, for which I submitted a patch to the tracker 
> system (with guidance from Martin), was to create a wrapper for the 
> sockaddr object, which one can create in advance, and when 
> _socket.connect() is called (actually when getsockaddrarg() is called 
> by connect), results in an immediate connection without any DNS 
> activity.
> This solution solves part of the problem, but may not be the right 
> final one.  After writing this patch and verifying its functionality, I 
> tried it in the real world.  Then, I realized that for sun-rpc work, it 
> wasn't quite what I needed, since the socket number may be changing 
> each time the rpc request is made, resulting in a new address wrapper 
> being needed, and thus DNS activity again.
> After thinking about what I have done with this patch, I would also 
> like to suggest another change (for which I am also willing to submit 
> the patch, which is quite simple):  Consistent with some of the already 
> extant glue in _socket to handle addresses like <broadcast>, would 
> there be any reason no to modify
> setipaddr() and getaddrinfo() so that if an address is prefixed with 
> <numeric> (e.g. <numeric> that the PASSIVE and NUMERIC flags 
> are always set so these routines reject any non-numeric address, but 
> handle numeric ones very efficiently?
> I have already implemented a predecessor to this which I am 
> experimentally running at home in python 2.2.2, in which I made it so 
> that prefixing the address with an exclamation point provided this 
> functionality.  Given the somewhat more legible approach the team has 
> already chosen for special addresses, I see no reason why using a 
> <numeric> (or some such) prefix isn't reasonable.
> Do any members of the development team have commentary on this?  Would 
> such a change be likely to be accepted into the system?  Any reasons 
> which it might break something?  The actual patch would be  only about 
> 10 lines of code, (plus some documentation), a few in each of the 
> routines mentioned above.

I don't see why we would have to add the <numeric> flag to the address
when the form of the address itself is already a perfect clue that the
address is purely numeric.  I'd be happy to see a patch that
intercepts addresses of the form \d+\.\d+\.\d+\.\d+ and parses those
without calling getaddrinfo().

--Guido van Rossum (home page: http://www.python.org/~guido/)