[Python-Dev] An infinite loop in dictobject.c

Daniel Farina daniel at heroku.com
Thu May 24 21:11:58 CEST 2012


Hello all.  I seem to be encountering somewhat rare an infinite loop
in hash table probing while importing _socket, as triggered by
init_socket.c in Python 2.6, as seen/patched shipped with Ubuntu 10.04
LTS.  The problem only reproduces on 32 bit machines, on both -O2 and
-O0 builds (which is how I have managed to retrieve the detailed stack
traces below).  To cut to the chase, the bottom of the stack trace
invariably looks like this, in particular the "key" (and therefore
"hash") value is always the same:

#0  0x08088637 in lookdict_string (mp=0xa042714, key='SO_RCVTIMEO',
    hash=612808203) at ../Objects/dictobject.c:421
#1  0x080886cd in insertdict (mp=0xa042714, key='SO_RCVTIMEO', hash=612808203,
    value=20) at ../Objects/dictobject.c:450
#2  0x08088cac in PyDict_SetItem (op=<unknown at remote 0x37>, key=
    'SO_RCVTIMEO', value=20) at ../Objects/dictobject.c:701
#3  0x0808b8d4 in PyDict_SetItemString (v=
    {'AF_INET6': 10, 'SocketType': <type at remote 0x8275e00>,
'getaddrinfo': <built-in function getaddrinfo>,
'TIPC_MEDIUM_IMPORTANCE': 1, 'htonl': <built-in function htonl>,
'AF_UNSPEC': 0, 'TIPC_DEST_DROPPABLE': 129, 'TIPC_ADDR_ID': 3,
'PF_PACKET': 17, 'AF_WANPIPE': 25, 'PACKET_OTHERHOST': 3, 'AF_AX25':
3, 'PACKET_BROADCAST': 1, 'PACKET_FASTROUTE': 6, 'TIPC_NODE_SCOPE': 3,
'inet_pton': <built-in function inet_pton>, 'AF_ATMPVC': 8,
'NETLINK_IP6_FW': 13, 'NETLINK_ROUTE': 0, 'TIPC_PUBLISHED': 1,
'TIPC_WITHDRAWN': 2, 'AF_ECONET': 19, 'AF_LLC': 26, '__name__':
'_socket', 'AF_NETROM': 6, 'SOCK_RDM': 4, 'AF_IRDA': 23, 'htons':
<built-in function htons>, 'SOCK_RAW': 3, 'inet_ntoa': <built-in
function inet_ntoa>, 'AF_NETBEUI': 13, 'AF_NETLINK': 16,
'TIPC_WAIT_FOREVER': -1, 'AF_UNIX': 1, 'TIPC_SUB_PORTS': 1,
'HCI_TIME_STAMP': 3, 'gethostbyname_ex': <built-in function
gethostbyname_ex>, 'SO_RCVBUF': 8, 'AF_APPLETALK': 5,
'SOCK_SEQPACKET': 5, 'AF_DECnet': 12, 'PACKET_OUTGOING': 4,
'SO_SNDLOWAT': 19, 'TIPC_SRC_DROPPABLE':...(truncated), key=0x81ac5fb
"SO_RCVTIMEO", item=20) at ../Objects/dictobject.c:2301
#4  0x080f6c98 in PyModule_AddObject (m=<module at remote 0xb73cac8c>, name=
    0x81ac5fb "SO_RCVTIMEO", o=20) at ../Python/modsupport.c:615
#5  0x080f6d0b in PyModule_AddIntConstant (m=<module at remote 0xb73cac8c>,
    name=0x81ac5fb "SO_RCVTIMEO", value=20) at ../Python/modsupport.c:627
#6  0x081321fd in init_socket () at ../Modules/socketmodule.c:4708

Here, we never escape from lookdict_string.  The key is not in the
dictionary, but at this stage Python is trying to figure out that is
the case, and cannot seem to exit because of the lack of a dummy
entry.  Furthermore, every single reproduced case has a dictionary
with a suspicious looking violation of an invariant that I believe is
communicated by the source of dictobject.c, with emphasis on the
values of ma_fill, ma_used, and ma_mask, which never deviate in any
reproduced case.  It seems like no hash table should ever get this
full, per the comments in the source:

$3 = {ob_refcnt = 1, ob_type = 0x81c3aa0, ma_fill = 128, ma_used = 128,
  ma_mask = 127, ma_table = 0xa06b4a8, ma_lookup =
    0x8088564 <lookdict_string>, ma_smalltable = {{me_hash = 0, me_key = 0x0,
      me_value = 0x0}, {me_hash = 1023053529, me_key = '__name__', me_value =
    '_socket'}, {me_hash = 1679430097, me_key = 'gethostbyname', me_value =
    <built-in function gethostbyname>}, {me_hash = 0, me_key = 0x0, me_value =
    0x0}, {me_hash = 779452068, me_key = 'gethostbyname_ex', me_value =
    <built-in function gethostbyname_ex>}, {me_hash = -322108099, me_key =
    '__doc__', me_value = None}, {me_hash = -1649837379, me_key =
    'gethostbyaddr', me_value = <built-in function gethostbyaddr>}, {
      me_hash = 1811348911, me_key = '__package__', me_value = None}}}

The Python program that is running afoul this bug is using gevent, but
the stack traces suggest that all gevent is doing at the time this
crashes is importing "socket", and this is done at the very, very
beginning of program execution.

Finally, what's especially strange is that I had gone a very long time
running this exact version of Python, libraries, and application quite
frequently: it suddenly started cropping up a little while ago (maybe
a few weeks).  It could have been just coincidence, but if there are
code paths in init_socket.c that may somehow be sensitive to the
network somehow, this could have been related.  I also have a limited
suspicion that particularly unlucky OOM (these systems are configured
in a way where malloc and friends will return NULL, i.e. no overcommit
on Linux) could be related.

Any guiding words, known bugs, or suspicions?

-- 
fdr


More information about the Python-Dev mailing list