[ python-Bugs-1041645 ] Thread management corrupts heap

SourceForge.net noreply at sourceforge.net
Thu Oct 7 04:24:29 CEST 2004


Bugs item #1041645, was opened at 2004-10-06 14:15
Message generated for change (Comment added) made by benson_basis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1041645&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 7
Submitted By: benson margulies (benson_basis)
Assigned to: Nobody/Anonymous (nobody)
Summary: Thread management corrupts heap

Initial Comment:
The PyGILState_Ensure mechanism appears to have a 
built-in heap-corrupting race condition.

If a new thread calls PyGILState_Ensure, then the code 
allocates a new 'tstate' for it on the heap. This 
allocation is not protected by any lock. So, multiple 
racing threads can hit the heap at the same time, and 
corrupt it.

I have observed this with both 2.3 and with 2.4a3.

I will attach a sample application. The application is 
Win32, but should be easy enough to adapt to Unix if 
someone cares to.

Since the stated purpose of this mechanism, in PEP311, 
is to allow any-old-thread to call into Python, I believe 
that the usage model here is legitimate.

To watch this explode, run the attached with arguments 
like, oh, 1 100 40 against the debug python build.




----------------------------------------------------------------------

>Comment By: benson margulies (benson_basis)
Date: 2004-10-06 22:24

Message:
Logged In: YES 
user_id=876734

I'm afraid that something is more seriously wrong here than 
just the malloc in Ensure.

Here's what I did in our production code. I protected all calls 
to Ensure with a mutex. The result was a similiar problem with 
the debug heap happening inside code that was executing 
under the GIL. Given the structure of our code, I'm 
reasonably confident that there is no rogue thread running 
around outside the GIL. In fact, at the time of the explosion 
here, the other threads are all waiting their turn to get the 
GIL.

 	btpython23_d.dll!PyObject_Malloc(unsigned int 
nbytes=98)  Line 581 + 0x6	C
 	btpython23_d.dll!_PyObject_DebugMalloc(unsigned 
int nbytes=82)  Line 990 + 0x9	C
 	btpython23_d.dll!PyString_FromStringAndSize(const 
char * str=0x03176384, int size=50)  Line 78 + 0xc	C
 	btpython23_d.dll!string_slice(PyStringObject * 
a=0x03176368, int i=0, int j=50)  Line 1006 + 0x17	C
 	btpython23_d.dll!PySequence_GetSlice(_object * 
s=0x03176368, int i1=0, int i2=50)  Line 1218 + 0x12	C
 	btpython23_d.dll!apply_slice(_object * 
u=0x03176368, _object * v=0x00000000, _object * 
w=0x02fb7acc)  Line 3806 + 0x11	C
 	btpython23_d.dll!eval_frame(_frame * 
f=0x031aeeb0)  Line 1391 + 0x11	C
 	btpython23_d.dll!PyEval_EvalCodeEx(PyCodeObject 
* co=0x030c9130, _object * globals=0x03101ab8, _object * 
locals=0x00000000, _object * * args=0x0323650c, int 
argcount=2, _object * * kws=0x00000000, int kwcount=0, 
_object * * defs=0x030eed0c, int defcount=2, _object * 
closure=0x00000000)  Line 2663 + 0x9	C
 	btpython23_d.dll!function_call(_object * 
func=0x0316eea8, _object * arg=0x032364f8, _object * 
kw=0x00000000)  Line 509 + 0x40	C
 	btpython23_d.dll!PyObject_Call(_object * 
func=0x0316eea8, _object * arg=0x032364f8, _object * 
kw=0x00000000)  Line 1755 + 0xf	C
 	btpython23_d.dll!PyEval_CallObjectWithKeywords
(_object * func=0x0316eea8, _object * arg=0x032364f8, 
_object * kw=0x00000000)  Line 3346 + 0x11	C
 	btpython23_d.dll!PyErr_Warn(_object * 
category=0x0306da10, char * message=0x1e14df70)  Line 
627 + 0xf	C
>	btpython23_d.dll!builtin_apply(_object * 
self=0x00000000, _object * args=0x03238a78)  Line 80 + 
0x10	C
 	btpython23_d.dll!PyCFunction_Call(_object * 
func=0x0305fb38, _object * arg=0x03238a78, _object * 
kw=0x00000000)  Line 73 + 0xb	C
 	btpython23_d.dll!call_function(_object * * * 
pp_stack=0x0364f2b0, int oparg=2)  Line 3439 + 0xf	C
 	btpython23_d.dll!eval_frame(_frame * 
f=0x0314a310)  Line 2116 + 0xd	C
 	btpython23_d.dll!PyEval_EvalCodeEx(PyCodeObject 
* co=0x03178868, _object * globals=0x03174818, _object * 
locals=0x00000000, _object * * args=0x0d558f0c, int 
argcount=2, _object * * kws=0x00000000, int kwcount=0, 
_object * * defs=0x00000000, int defcount=0, _object * 
closure=0x00000000)  Line 2663 + 0x9	C
 	btpython23_d.dll!function_call(_object * 
func=0x0320f198, _object * arg=0x0d558ef8, _object * 
kw=0x00000000)  Line 509 + 0x40	C
 	btpython23_d.dll!PyObject_Call(_object * 
func=0x0320f198, _object * arg=0x0d558ef8, _object * 
kw=0x00000000)  Line 1755 + 0xf	C
 	btpython23_d.dll!instancemethod_call(_object * 
func=0x0320f198, _object * arg=0x0d558ef8, _object * 
kw=0x00000000)  Line 2432 + 0x11	C
 	btpython23_d.dll!PyObject_Call(_object * 
func=0x0321fd38, _object * arg=0x030728f8, _object * 
kw=0x00000000)  Line 1755 + 0xf	C
 	btpython23_d.dll!PyObject_CallMethod(_object * 
o=0x0317a4f8, char * name=0x64c453f4, char * 
format=0x64c453f0, ...)  Line 1844 + 0xf	C
 	btrexcoreC230.dll!BT_REX_LP_PyProxy::Run
(BT_Blackboard * ref=0x02eacc88)  Line 136 + 0x1b	C++
 	btrexcoreC230.dll!BT_Blackboard::Run()  Line 166 + 
0x29	C++
 	btrexcoreC230.dll!
BT_REX_ContextImp::ProcessUTF16Buffer(const unsigned 
short * inbuf=0x00ab78bc, unsigned int inlen=673, int lid=30)  
Line 609	C++
 	basistechnology.rlp.dll!
BasisTechnology.RLP.ContextImp.ProcessBuffer(String* data 
= "South African answers U.S. message in a bottle. 

JOHANNESBURG 
08-22-1996 

A South African boy is writing back to an American girl whose 
message in a bottle he found washed up on President Nelson 
Mandela's old prison island. But Carlo Hoffmann, an 11-year-
old jailer's son who found the bottle on the beach at Robben 
Island off Cape Town after winter storms, will send his letter 
back by ordinary mail on Thursday, the post office said. It will 
be sent for free. Danielle Murray from Sandusky, Ohio, the 
same age as her new penfriend, asked for a reply from 
whoever received the message she flung on its journey 
months ago on the other side of the Atlantic Ocean. 
", BasisTechnology.RLP.LanguageID __unnamed001 = 
ENGLISH) Line 84 + 0x33 bytes	C++
 	ThreadStress.exe!
BasisTechnology.RLPUnitTest.Worker.Go() Line 63	C#


----------------------------------------------------------------------

Comment By: benson margulies (benson_basis)
Date: 2004-10-06 21:57

Message:
Logged In: YES 
user_id=876734

The rather funny-looking initialization was recommended by 
the author of the PEP. Once InitThreads is called, the GIL is 
held. To get it unheld, you have to release it. Here's his 
explanation ..

OK - here is my explanation which you can adapt
 
A simple application is one that initializes Python, and then 
executes some Python code.  This Python code may itself 
cause other threads, or may indirectly cause other 
applications to "callback" into Python, but once the main 
thread returns, all the code has finished - you application 
terminates. With these applications, your init code need take 
no further action - just initialize Python and execute your 
code.  Only your extension functions or callback entry points 
need manage the GIL state.  python.exe is an example of this 
kind of app - it initializes Python, then executes the code 
specified on the command line.
 
More complex applications may need to initialize Python, 
perform a little bit of initialization, but then do nothing else on 
that thread.  An example would be where that initialization 
code bootstraps a few threads, then terminates immediately.  
The application however continues to live until explicitly 
terminated by the running code.
 
In this scenario, you must use the following code - this code 
calls into Python.  Once this call has been made, the thread 
still holds the GIL.  This thread must release the GIL before 
other threads can work.
 
[Technical aside: This works in the first case as the main 
thread which holds the lock continues to run.  When it calls 
out to external libraries and at periodic times during 
execution, the thread-lock is unlocked allowing other threads 
to run.  Once the main thread returns, it does still hold the 
lock - but as the application is terminating, that is OK - no 
other threads are competing for the lock.  In our complicated 
example, the main thread returning does *not* indicate 
application termination, so the lock must be manually 
dropped.]
 
That doesn't make a huge amount of sense, but I hope it is 
useful.


All this being said, I have two contradictory bits of evidence.

1) We all agree that the code, as written, would fail due to 
the unsafe debug malloc.

2) When I reverse the calls, it produces a very different 
situation. I think, perhaps, that the GIL never gets unlocked 
properly, and then instead of crashing everything just gets 
stuck somehow.

The change to call malloc seems pretty reasonable to me. 
Some tests seem called for in the tree on this. If order is not 
supposed to matter, the test should test both orders.




----------------------------------------------------------------------

Comment By: Nick Coghlan (ncoghlan)
Date: 2004-10-06 20:46

Message:
Logged In: YES 
user_id=1038590

I'd suggest assigning the bug report to Tim Peters (aka
tim_one), to see if he still agrees with the fix he proposed
last month. 

----------------------------------------------------------------------

Comment By: Nick Coghlan (ncoghlan)
Date: 2004-10-06 20:43

Message:
Logged In: YES 
user_id=1038590

*rereads bug description and comments*

I think I'm on the right page now. The offending call would
appear to be PyMem_NEW inside PyThreadState_New.

Under a release build, this resolves into a direct call to
malloc(), but under debug it resolves to _PyObjectDebugMalloc.

Which is what you said in the bug report - I just read it
incorrectly. Sorry for the confusion.

I just remembered that this came up a few weeks ago on
Python dev, with a proposed solution (changing this
particular call to use malloc() directly):
http://mail.python.org/pipermail/python-dev/2004-September/048683.html



----------------------------------------------------------------------

Comment By: Nick Coghlan (ncoghlan)
Date: 2004-10-06 20:15

Message:
Logged In: YES 
user_id=1038590

Disregard the previous comment regarding PyEval_InitThreads
and Py_Initialize - I was reading an older version of the
documentation which suggested the order of the calls mattered.

This does leak the constructed strings (no call to
Py_XDECREF(trash)), but that shouldn't cause anything too
catastrophic.

However, the call to PyGILState_Release without a preceding
call to PyGILState_Ensure looks dubious (it ends up deleting
the main thread's thread state, which seems unhealthy).

Do the same symptoms occur when using the
Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS macros
around the call to run_group()? (These macros are
recommended when you *know* you already have the GIL. The
PyGILState functions are for when you don't know if you have
the GIL or not)

----------------------------------------------------------------------

Comment By: benson margulies (benson_basis)
Date: 2004-10-06 19:56

Message:
Logged In: YES 
user_id=876734

On the one hand, you are correct that changing the order 
improves the behavior. On the other hand, the doc is as I 
recalled it, asserting that you can call it first. Perhaps this 
should be registered as a doc problem?

This is a no-op when called for a second time. It is safe to 
call this function before calling Py_Initialize() . 

----------------------------------------------------------------------

Comment By: benson margulies (benson_basis)
Date: 2004-10-06 19:49

Message:
Logged In: YES 
user_id=876734

The documentation very specifically states that the order 
doesn't matter. Further, I've read the code, and the problem 
is unprotected data structures in the python debug heap, 
which won't be any more protected in the other order.

I'll switch them and report back, just in case.


----------------------------------------------------------------------

Comment By: Nick Coghlan (ncoghlan)
Date: 2004-10-06 19:32

Message:
Logged In: YES 
user_id=1038590

I haven't checked if that would fix the problem, since I'm
currently on Linux.

----------------------------------------------------------------------

Comment By: Nick Coghlan (ncoghlan)
Date: 2004-10-06 19:31

Message:
Logged In: YES 
user_id=1038590

The attached program calls PyEval_InitThreads and
Py_Initialize in the wrong order.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1041645&group_id=5470


More information about the Python-bugs-list mailing list