An infinite loop in dictobject.c
Hello all. I seem to be encountering somewhat rare an infinite loop in hash table probing while importing _socket, as triggered by init_socket.c in Python 2.6, as seen/patched shipped with Ubuntu 10.04 LTS. The problem only reproduces on 32 bit machines, on both -O2 and -O0 builds (which is how I have managed to retrieve the detailed stack traces below). To cut to the chase, the bottom of the stack trace invariably looks like this, in particular the "key" (and therefore "hash") value is always the same: #0 0x08088637 in lookdict_string (mp=0xa042714, key='SO_RCVTIMEO', hash=612808203) at ../Objects/dictobject.c:421 #1 0x080886cd in insertdict (mp=0xa042714, key='SO_RCVTIMEO', hash=612808203, value=20) at ../Objects/dictobject.c:450 #2 0x08088cac in PyDict_SetItem (op=<unknown at remote 0x37>, key= 'SO_RCVTIMEO', value=20) at ../Objects/dictobject.c:701 #3 0x0808b8d4 in PyDict_SetItemString (v= {'AF_INET6': 10, 'SocketType': <type at remote 0x8275e00>, 'getaddrinfo': <built-in function getaddrinfo>, 'TIPC_MEDIUM_IMPORTANCE': 1, 'htonl': <built-in function htonl>, 'AF_UNSPEC': 0, 'TIPC_DEST_DROPPABLE': 129, 'TIPC_ADDR_ID': 3, 'PF_PACKET': 17, 'AF_WANPIPE': 25, 'PACKET_OTHERHOST': 3, 'AF_AX25': 3, 'PACKET_BROADCAST': 1, 'PACKET_FASTROUTE': 6, 'TIPC_NODE_SCOPE': 3, 'inet_pton': <built-in function inet_pton>, 'AF_ATMPVC': 8, 'NETLINK_IP6_FW': 13, 'NETLINK_ROUTE': 0, 'TIPC_PUBLISHED': 1, 'TIPC_WITHDRAWN': 2, 'AF_ECONET': 19, 'AF_LLC': 26, '__name__': '_socket', 'AF_NETROM': 6, 'SOCK_RDM': 4, 'AF_IRDA': 23, 'htons': <built-in function htons>, 'SOCK_RAW': 3, 'inet_ntoa': <built-in function inet_ntoa>, 'AF_NETBEUI': 13, 'AF_NETLINK': 16, 'TIPC_WAIT_FOREVER': -1, 'AF_UNIX': 1, 'TIPC_SUB_PORTS': 1, 'HCI_TIME_STAMP': 3, 'gethostbyname_ex': <built-in function gethostbyname_ex>, 'SO_RCVBUF': 8, 'AF_APPLETALK': 5, 'SOCK_SEQPACKET': 5, 'AF_DECnet': 12, 'PACKET_OUTGOING': 4, 'SO_SNDLOWAT': 19, 'TIPC_SRC_DROPPABLE':...(truncated), key=0x81ac5fb "SO_RCVTIMEO", item=20) at ../Objects/dictobject.c:2301 #4 0x080f6c98 in PyModule_AddObject (m=<module at remote 0xb73cac8c>, name= 0x81ac5fb "SO_RCVTIMEO", o=20) at ../Python/modsupport.c:615 #5 0x080f6d0b in PyModule_AddIntConstant (m=<module at remote 0xb73cac8c>, name=0x81ac5fb "SO_RCVTIMEO", value=20) at ../Python/modsupport.c:627 #6 0x081321fd in init_socket () at ../Modules/socketmodule.c:4708 Here, we never escape from lookdict_string. The key is not in the dictionary, but at this stage Python is trying to figure out that is the case, and cannot seem to exit because of the lack of a dummy entry. Furthermore, every single reproduced case has a dictionary with a suspicious looking violation of an invariant that I believe is communicated by the source of dictobject.c, with emphasis on the values of ma_fill, ma_used, and ma_mask, which never deviate in any reproduced case. It seems like no hash table should ever get this full, per the comments in the source: $3 = {ob_refcnt = 1, ob_type = 0x81c3aa0, ma_fill = 128, ma_used = 128, ma_mask = 127, ma_table = 0xa06b4a8, ma_lookup = 0x8088564 <lookdict_string>, ma_smalltable = {{me_hash = 0, me_key = 0x0, me_value = 0x0}, {me_hash = 1023053529, me_key = '__name__', me_value = '_socket'}, {me_hash = 1679430097, me_key = 'gethostbyname', me_value = <built-in function gethostbyname>}, {me_hash = 0, me_key = 0x0, me_value = 0x0}, {me_hash = 779452068, me_key = 'gethostbyname_ex', me_value = <built-in function gethostbyname_ex>}, {me_hash = -322108099, me_key = '__doc__', me_value = None}, {me_hash = -1649837379, me_key = 'gethostbyaddr', me_value = <built-in function gethostbyaddr>}, { me_hash = 1811348911, me_key = '__package__', me_value = None}}} The Python program that is running afoul this bug is using gevent, but the stack traces suggest that all gevent is doing at the time this crashes is importing "socket", and this is done at the very, very beginning of program execution. Finally, what's especially strange is that I had gone a very long time running this exact version of Python, libraries, and application quite frequently: it suddenly started cropping up a little while ago (maybe a few weeks). It could have been just coincidence, but if there are code paths in init_socket.c that may somehow be sensitive to the network somehow, this could have been related. I also have a limited suspicion that particularly unlucky OOM (these systems are configured in a way where malloc and friends will return NULL, i.e. no overcommit on Linux) could be related. Any guiding words, known bugs, or suspicions? -- fdr
Daniel Farina wrote:
Hello all. I seem to be encountering somewhat rare an infinite loop in hash table probing while importing _socket, as triggered by init_socket.c in Python 2.6, as seen/patched shipped with Ubuntu 10.04 LTS. The problem only reproduces on 32 bit machines, on both -O2 and -O0 builds (which is how I have managed to retrieve the detailed stack
Please submit a report to the tracker for this. (Add me to the nosy list if you can) Cheers, Mark.
On Thu, May 24, 2012 at 12:59 PM, Mark Shannon <mark@hotpy.org> wrote:
Please submit a report to the tracker for this. (Add me to the nosy list if you can)
http://bugs.python.org/issue14903 However, I cannot add you to the nosy list, as you do not show up in the search. -- fdr
On 5/24/2012 4:20 PM, Daniel Farina wrote:
On Thu, May 24, 2012 at 12:59 PM, Mark Shannon<mark@hotpy.org> wrote:
Please submit a report to the tracker for this. (Add me to the nosy list if you can)
http://bugs.python.org/issue14903
However, I cannot add you to the nosy list, as you do not show up in the search.
The nosy list box search only shows people with commit privileges. Others you have to find them in the User List accessible from the sidebar, but that may be admin only for all I know. Anyway, Mark should have said 'as Mark.Shannon'. I have added him on the issue. -- Terry Jan Reedy
If it only started happening recently, suspicion would naturally fall on the hash randomisation security fix (as I assume a new version of Python would have been pushed for 10.04 with that update) Cheers, Nick. -- Sent from my phone, thus the relative brevity :)
On May 25, 2012, at 06:07 AM, Nick Coghlan wrote:
If it only started happening recently, suspicion would naturally fall on the hash randomisation security fix (as I assume a new version of Python would have been pushed for 10.04 with that update)
I do not think the hash randomization patch has been pushed to Python 2.6 in Lucid 10.04.4 yet, which still has Python 2.6.5 (plus patches, but not that one). -Barry
On Thu, May 24, 2012 at 1:07 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
If it only started happening recently, suspicion would naturally fall on the hash randomisation security fix (as I assume a new version of Python would have been pushed for 10.04 with that update)
I do not think so; I do not see in in the backpatches made to 2.6.5 (http://packages.ubuntu.com/lucid/python2.6), unless they are particularly slick. -- fdr
On Thu, 24 May 2012 12:11:58 -0700 Daniel Farina <daniel@heroku.com> wrote:
Finally, what's especially strange is that I had gone a very long time running this exact version of Python, libraries, and application quite frequently: it suddenly started cropping up a little while ago (maybe a few weeks).
Do you mean it's a hand-compiled Python? Are you sure you didn't recompile it / update to a later version recently? If you didn't change anything, this may be something unrelated to Python, such as a hardware problem. You are right that ma_fill == ma_used should, AFAIK, never happen. Perhaps you could add conditional debug statements when that condition happens, to know where it comes from. Furthermore, if this is a hand-compiled Python, you could reconfigure it --with-pydebug, so as to enable more assertions in the interpreter core (this will make it quite a bit slower too :-)). Regards Antoine.
On Thu, May 24, 2012 at 1:15 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Thu, 24 May 2012 12:11:58 -0700 Daniel Farina <daniel@heroku.com> wrote:
Finally, what's especially strange is that I had gone a very long time running this exact version of Python, libraries, and application quite frequently: it suddenly started cropping up a little while ago (maybe a few weeks).
Do you mean it's a hand-compiled Python? Are you sure you didn't recompile it / update to a later version recently?
Quite sure. It was vanilla Ubuntu, and then I side-graded it to vanilla ubuntu at -O0.
If you didn't change anything, this may be something unrelated to Python, such as a hardware problem.
Occurs on a smattering of a large number of systems more or less at random. Seems unlikely.
You are right that ma_fill == ma_used should, AFAIK, never happen. Perhaps you could add conditional debug statements when that condition happens, to know where it comes from.
Furthermore, if this is a hand-compiled Python, you could reconfigure it --with-pydebug, so as to enable more assertions in the interpreter core (this will make it quite a bit slower too :-)).
Yes, this is my next step, although I am going to do a bit more whacking of the interpreter as to pause rather than crash when it encounters this problem. -- fdr
On Fri, May 25, 2012 at 6:23 AM, Daniel Farina <daniel@heroku.com> wrote:
Furthermore, if this is a hand-compiled Python, you could reconfigure it --with-pydebug, so as to enable more assertions in the interpreter core (this will make it quite a bit slower too :-)).
Yes, this is my next step, although I am going to do a bit more whacking of the interpreter as to pause rather than crash when it encounters this problem.
You may also want to give Victor's faulthandler module a try: http://pypi.python.org/pypi/faulthandler/ Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (6)
-
Antoine Pitrou
-
Barry Warsaw
-
Daniel Farina
-
Mark Shannon
-
Nick Coghlan
-
Terry Reedy