[Python-Dev] _socket efficiencies ideas

Tue, 8 Apr 2003 10:59:27 -0500

Thanks for your prompt reply!

On Tuesday, April 8, 2003, at 09:50 AM, Guido van Rossum wrote:

>> I have been in discussion recently with Martin v. Loewis about an idea
>> I have been thinking about for a while to improve the efficiency of 
>> the
>> connect method in the _socket module.  I posted the original 
>> suggestion
>> to the python suggestions tracker on sourceforge as item 706392.
>>
>> A bit of history and justification:
>> I am doing a lot of work using python to develop almost-real-time
>> distributed data acquisition and control systems from running
>> laboratory apparatus.  In this environment, I do a lot of sun-rpc 
>> calls
>> as part of the vxi-11 protocol to allow TCP/IP access to gpib-like
>> devices.  As a part of this, I do a lot sock socket.connect() calls,
>> often with the connections being quite transient. The problem is that
>> the current python _socket module makes a DNS call to try to resolve
>> each address before connect is called, which if I am
>> connecting/disconnecting many times a second results in pathological
>> and gratuitous network activity.  Incidentally, I am in the process of
>> creating a sourceforge project, pythonlabtools (just approved this
>> morning), in which I will start maintaining a repository of the tools 
>> I
>> have been working on.
>
> Are you sure that it tries make a DNS call even when the address is
> pure numeric?  That seems a mistake, and if that's really happening, I
> think that is the part that should be fixed.  Maybe in the _socket
> module, maybe in getaddrinfo().
>
Yes, it seems to do this.  It sets the PASSIVE flags, but that doesn't 
seem to be quite enough to prevent DNS activity, although the NUMERIC 
flag does the job.  This is true, at least, in 2.3.x on MacOSX, and 
since the socket stuff is all the same, I suspect it is true on many 
Unixes.  Note that this doesn't happen on the MacOS9 version, which 
provides its own socket interface through GUSI, which apparently is 
smart enough to handle it.
>> My first solution to this, for which I submitted a patch to the 
>> tracker
>> system (with guidance from Martin), was to create a wrapper for the
>> sockaddr object, which one can create in advance, and when
>> _socket.connect() is called (actually when getsockaddrarg() is called
>> by connect), results in an immediate connection without any DNS
>> activity.
>>
>> This solution solves part of the problem, but may not be the right
>> final one.  After writing this patch and verifying its functionality, 
>> I
>> tried it in the real world.  Then, I realized that for sun-rpc work, 
>> it
>> wasn't quite what I needed, since the socket number may be changing
>> each time the rpc request is made, resulting in a new address wrapper
>> being needed, and thus DNS activity again.
>>
>> After thinking about what I have done with this patch, I would also
>> like to suggest another change (for which I am also willing to submit
>> the patch, which is quite simple):  Consistent with some of the 
>> already
>> extant glue in _socket to handle addresses like <broadcast>, would
>> there be any reason no to modify
>> setipaddr() and getaddrinfo() so that if an address is prefixed with
>> <numeric> (e.g. <numeric>127.0.0.1) that the PASSIVE and NUMERIC flags
>> are always set so these routines reject any non-numeric address, but
>> handle numeric ones very efficiently?
>>
>> I have already implemented a predecessor to this which I am
>> experimentally running at home in python 2.2.2, in which I made it so
>> that prefixing the address with an exclamation point provided this
>> functionality.  Given the somewhat more legible approach the team has
>> already chosen for special addresses, I see no reason why using a
>> <numeric> (or some such) prefix isn't reasonable.
>>
>> Do any members of the development team have commentary on this?  Would
>> such a change be likely to be accepted into the system?  Any reasons
>> which it might break something?  The actual patch would be  only about
>> 10 lines of code, (plus some documentation), a few in each of the
>> routines mentioned above.
>
> I don't see why we would have to add the <numeric> flag to the address
> when the form of the address itself is already a perfect clue that the
> address is purely numeric.  I'd be happy to see a patch that
> intercepts addresses of the form \d+\.\d+\.\d+\.\d+ and parses those
> without calling getaddrinfo().
>
Do we want this?  The parser also then have to be modified when to 
handle numeric INET6 addresses, when they become popular.  I actually 
did implement one of my trial versions this way, and it worked fine.  
There is one minor issue, too.  In urllib, there are some calls to 
getaddrinfo to get (for maybe no good reason), CNAMEs of addresses.  I 
would like some way to tag an address with a very strong comment that 
it is what it is, and I would like all further processing disabled.  
Also, a 'trial' parsing of an address for matching a a.b.c.d pattern 
each time is a lot more processor inensive than checking for <numeric> 
at the beginning.

I am perfectly happy to implement it either way.

> --Guido van Rossum (home page: http://www.python.org/~guido/)
>