[Python-bugs-list] [ python-Bugs-539175 ] resolver not thread safe

noreply@sourceforge.net noreply@sourceforge.net
Fri, 05 Apr 2002 13:44:32 -0800


Bugs item #539175, was opened at 2002-04-04 01:54
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=539175&group_id=5470

Category: Threads
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: dustin sallings (dustin)
Assigned to: Nobody/Anonymous (nobody)
Summary: resolver not thread safe

Initial Comment:
I've got an application that does SNMP monitoring and
has a thread listening with SimpleXMLRPCServer for
remote control.  I noticed the XMLRPC listener logging
an incorrect address while snmp jobs were processing:

sw1.west.spy.net - - [04/Apr/2002 01:16:37] "POST /RPC2
HTTP/1.0" 200 -
localhost.west.spy.net - - [04/Apr/2002 01:16:43] "POST
/RPC2 HTTP/1.0" 200 -

sw1 is one of the machines that is being queried, but
the XMLRPC requests are happening over localhost.

gethostbyname() and gethostbyaddr() both return static
data, thus they aren't reentrant.

As a workaround, I copied socket.py to my working
directory and added the following to it:

try:
    import threading
except ImportError, ie:
    sys.stderr.write(str(ie) + "\n")

# mutex for DNS lookups
__dns_mutex=None
try:
    __dns_mutex=threading.Lock()
except NameError:
    pass

def __lock():
    if __dns_mutex!=None:
        __dns_mutex.acquire()
    
def __unlock():
    if __dns_mutex!=None:
        __dns_mutex.release()
    
def gethostbyaddr(addr):
    """Override gethostbyaddr to try to get some thread
safety."""
    rv=None
    try:
        __lock()
        rv=_socket.gethostbyaddr(addr)
    finally:
        __unlock()
    return rv

def gethostbyname(name):
    """Override gethostbyname to try to get some thread
safety."""
    rv=None
    try:
        __lock()
        rv=_socket.gethostbyname(name)
    finally:
        __unlock()
    return rv


----------------------------------------------------------------------

>Comment By: dustin sallings (dustin)
Date: 2002-04-05 13:44

Message:
Logged In: YES 
user_id=43919

I first noticed this problem on my OS X box.

Since it's affecting me, it's not obvious to anyone else,
and I'm perfectly capable of fixing it myself, I'll try to
spend some time figuring out what's going on this weekend. 
It seems like it might be making a decision to not use the
lock at compile time.  I will investigate further and submit
a patch.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-04-05 13:31

Message:
Logged In: YES 
user_id=31435

Just a reminder that the first thing to try on any SGI box 
is to recompile Python with optimization disabled.  I can't 
remember the last time we had "a Python bug" on SGI that 
wasn't traced to a compiler -O bug.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-04-05 00:56

Message:
Logged In: YES 
user_id=21627

Can you spot the error in the Python socket module? I still
fail to see our bug, and I would assume it is a C library
bug; I also cannot reproduce the problem on any of my machines.

Can you please report the settings of the various HAVE_
defines for irix?

----------------------------------------------------------------------

Comment By: dustin sallings (dustin)
Date: 2002-04-04 14:21

Message:
Logged In: YES 
user_id=43919

Looking over the code a bit more, I see that my last message
wasn't entirely accurate.  There does seem to be only one
lock for both gethostbyname and gethostbyaddr
(gethostbyname_lock is used for both).

This is a pretty simple test that illustrates the problem
I'm seeing.  My previous work was on my OS X machine, but
this is Python 2.2 (#3, Mar  6 2002, 18:30:37) [C] on irix6.

#!/usr/bin/env python
#
# Copyright (c) 2002  Dustin Sallings <dustin@spy.net>
# $Id$

import threading
import socket
import time

class ResolveMe(threading.Thread):

    hosts=['propaganda.spy.net', 'bleu.west.spy.net',
'mail.west.spy.net']

    def __init__(self):
        threading.Thread.__init__(self)
        self.setDaemon(1)

    def run(self):
        # Run 100 times
        for i in range(100):
            for h in self.hosts:
                nrv=socket.gethostbyname_ex(h)
                arv=socket.gethostbyaddr(nrv[2][0])

                try:
                    # Verify the hostname is correct
                    assert(h==nrv[0])
                    # Verify the two hostnames match
                    assert(nrv[0]==arv[0])
                    # Verify the two addresses match
                    assert(nrv[2]==arv[2])
                except AssertionError:
                    print "Failed!  Checking " + `h` + "
got, " \
                        + `nrv` + " and " + `arv`

if __name__=='__main__':
    for i in range(1,10):
        print "Starting " + `i` + " threads."
        threads=[]
        for n in range(i):
            rm=ResolveMe()
            rm.start()
            threads.append(rm)
        for t in threads:
            t.join()
        print `i` + " threads complete."
    time.sleep(60)

The output looks like this:

verde:/tmp 190> ./pytest.py
Starting 1 threads.
1 threads complete.
Starting 2 threads.
Failed!  Checking 'propaganda.spy.net' got,
('mail.west.spy.net', [], ['66.149.231.226']) and
('mail.west.spy.net', [], ['66.149.231.226'])
Failed!  Checking 'bleu.west.spy.net' got,
('mail.west.spy.net', [], ['66.149.231.226']) and
('mail.west.spy.net', [], ['66.149.231.226'])

[...]

----------------------------------------------------------------------

Comment By: dustin sallings (dustin)
Date: 2002-04-04 13:08

Message:
Logged In: YES 
user_id=43919

The XMLRPC request is clearly being logged as coming from my
cisco switch when it was, in fact, coming from localhost.

I can't find any clear documentation, but it seems that on
at least some systems gethostbyname and gethostbyaddr
reference the same static variable, so having separate locks
for each one (as seen in socketmodule.c) doesn't help
anything.  It's not so much that they're not reentrant, but
you can't call any combination of the two of them at the
same time.  Here's some test code:

#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <assert.h>

int main(int argc, char **argv) {
    struct hostent *byaddr, *byname;
    unsigned int addr;
    struct sockaddr *sa = (struct sockaddr *)&addr;

    addr=1117120483;

    byaddr=gethostbyaddr(sa, sizeof(addr), AF_INET);
    assert(byaddr);
    printf("byaddr:  %s\n", byaddr->h_name);

    byname=gethostbyname("mail.west.spy.net");
    assert(byname);
    printf("byname:  %s\n", byname->h_name);

    printf("\nReprinting:\n\n");

    printf("byaddr:  %s\n", byaddr->h_name);
    printf("byname:  %s\n", byname->h_name);
}


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-04-04 12:06

Message:
Logged In: YES 
user_id=21627

I'm not sure what problem you are reporting. Python does not
attempt to invoke gethostbyname from two threads
simultaneously; this is prevented by the GIL.

On some systems, gethostname is reentrant (in the
gethostname_r incarnation); Python uses that where
available, and releases the GIL before calling it.

So I fail to see the bug.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=539175&group_id=5470