[ python-Bugs-1579370 ] Segfault provoked by generators and exceptions

SourceForge.net noreply at sourceforge.net
Mon Jan 22 08:51:20 CET 2007


Bugs item #1579370, was opened at 2006-10-18 04:23
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1579370&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Interpreter Core
Group: Python 2.5
Status: Open
Resolution: None
Priority: 9
Private: No
Submitted By: Mike Klaas (mklaas)
Assigned to: Nobody/Anonymous (nobody)
Summary: Segfault provoked by generators and exceptions

Initial Comment:
A reproducible segfault when using heavily-nested
generators and exceptions.

Unfortunately, I haven't yet been able to provoke this
behaviour with a standalone python2.5 script.  There
are, however, no third-party c extensions running in
the process so I'm fairly confident that it is a
problem in the core.

The gist of the code is a series of nested generators
which leave scope when an exception is raised.  This
exception is caught and re-raised in an outer loop. 
The old exception was holding on to the frame which was
keeping the generators alive, and the sequence of
generator destruction and new finalization caused the
segfault.   

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2007-01-22 08:51

Message:
Logged In: YES 
user_id=21627
Originator: NO

I don't like mklaas' patch, since I think it is conceptually wrong to have
PyTraceBack_Here() use the frame's thread state (mklaas describes it as
dirty, and I agree). I'm proposing an alternative patch (tr.diff); please
test this as well.
File Added: tr.diff

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-17 08:01

Message:
Logged In: YES 
user_id=33168
Originator: NO

Bumping priority to see if this should go into 2.5.1.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2007-01-04 11:42

Message:
Logged In: YES 
user_id=21627
Originator: NO

Why do frame objects have a thread state in the first place? In
particular, why does PyTraceBack_Here get the thread state from the frame,
instead of using the current thread?

Introduction of f_tstate goes back to r7882, but it is not clear why it
was done that way.

----------------------------------------------------------------------

Comment By: Andrew Waters (awaters)
Date: 2007-01-04 10:35

Message:
Logged In: YES 
user_id=1418249
Originator: NO

This fixes the segfault problem that I was able to reliably reproduce on
Linux.

We need to get this applied (assuming it is the correct fix) to the source
to make Python 2.5 usable for me in production code.

----------------------------------------------------------------------

Comment By: Mike Klaas (mklaas)
Date: 2006-11-27 19:41

Message:
Logged In: YES 
user_id=1611720
Originator: YES

The following patch resets the thread state of the generator when it is
resumed, which prevents the segfault for me:

Index: Objects/genobject.c
===================================================================
--- Objects/genobject.c (revision 52849)
+++ Objects/genobject.c (working copy)
@@ -77,6 +77,7 @@
        Py_XINCREF(tstate->frame);
        assert(f->f_back == NULL);
        f->f_back = tstate->frame;
+        f->f_tstate = tstate;
 
        gen->gi_running = 1;
        result = PyEval_EvalFrameEx(f, exc);

----------------------------------------------------------------------

Comment By: Eric Noyau (eric_noyau)
Date: 2006-11-27 19:07

Message:
Logged In: YES 
user_id=1388768
Originator: NO

We are experiencing the same segfault in our application, reliably.
Running our unit test suite just segfault everytime on both Linux and Mac
OS X. Applying Martin's patch fixes the segfault, and makes everything fine
and dandy, at the cost of some memory leaks if I understand properly.

This particular bug prevents us to upgrade to python 2.5 in production.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2006-10-28 07:18

Message:
Logged In: YES 
user_id=31435

> I tried Tim's hope.py on Linux x86_64 and
> Mac OS X 10.4 with debug builds and neither
> one crashed.  Tim's guess looks pretty damn
> good too.

Neal, note that it's the /Windows/ malloc that fills freed
memory with "dangerous bytes" in a debug build -- this
really has nothing to do with that it's a debug build of
/Python/ apart from that on Windows a debug build of Python
also links in the debug version of Microsoft's malloc.

The valgrind report is pointing at the same thing.  Whether
this leads to a crash is purely an accident of when and how
the system malloc happens to reuse the freed memory.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-10-28 06:56

Message:
Logged In: YES 
user_id=33168

Mike, what platform are you having the problem on?

I tried Tim's hope.py on Linux x86_64 and Mac OS X 10.4 with
debug builds and neither one crashed.  Tim's guess looks
pretty damn good too.  Here's the result of valgrind:

Invalid read of size 8                                     
                    
   at 0x4CEBFE: PyTraceBack_Here (traceback.c:117)         
                    
   by 0x49C1F1: PyEval_EvalFrameEx (ceval.c:2515)          
                    
   by 0x4F615D: gen_send_ex (genobject.c:82)               
                    
   by 0x4F6326: gen_close (genobject.c:128)                
                    
   by 0x4F645E: gen_del (genobject.c:163)                  
                    
   by 0x4F5F00: gen_dealloc (genobject.c:31)               
                    
   by 0x44D207: _Py_Dealloc (object.c:1928)                
                    
   by 0x44534E: dict_dealloc (dictobject.c:801)            
                    
   by 0x44D207: _Py_Dealloc (object.c:1928)                
                    
   by 0x4664FF: subtype_dealloc (typeobject.c:686)         
                    
   by 0x44D207: _Py_Dealloc (object.c:1928)                
                    
   by 0x42325D: instancemethod_dealloc (classobject.c:2287)
                    
 Address 0x56550C0 is 88 bytes inside a block of size 152
free'd                
   at 0x4A1A828: free (vg_replace_malloc.c:233)            
                    
   by 0x4C3899: tstate_delete_common (pystate.c:256)       
                    
   by 0x4C3926: PyThreadState_DeleteCurrent (pystate.c:282)
                    
   by 0x4D4043: t_bootstrap (threadmodule.c:448)           
                    
   by 0x4B24C48: pthread_start_thread (in
/lib/libpthread-0.10.so)              

The only way I can think to fix this is to keep a set of
active generators in the PyThreadState and calling
gen_send_ex(exc=1) for all the active generators before
killing the tstate in t_bootstrap.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2006-10-19 09:58

Message:
Logged In: YES 
user_id=6656

> and for some reason Python uses the system malloc directly
> to obtain memory for thread states.

This bit is fairly easy: they are allocated without the GIL being held,
which 
breaks an assumption of PyMalloc.

No idea about the real problem, sadly.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2006-10-19 02:38

Message:
Logged In: YES 
user_id=31435

I've attached a much simplified pure-Python script (hope.py)
that reproduces a problem very quickly, on Windows, in a
/debug/ build of current trunk.  It typically prints:

exiting generator
joined thread

at most twice before crapping out.  At the time, the `next`
argument to newtracebackobject() is 0xdddddddd, and tracing
back a level shows that, in PyTraceBack_Here(),
frame->tstate is entirely filled with 0xdd bytes.

Note that this is not a debug-build obmalloc gimmick!  This
is Microsoft's similar debug-build gimmick for their malloc,
and for some reason Python uses the system malloc directly
to obtain memory for thread states.  The Microsoft debug
free() fills newly-freed memory with 0xdd, which has the
same meaning as the debug-build obmalloc's DEADBYTE (0xdb).

So somebody is accessing a thread state here after it's been
freed.  Best guess is that the generator is getting "cleaned
up" after the thread that created it has gone away, so the
generator's frame's f_tstate is trash.

Note that a PyThreadState (a frame's f_tstate) is /not/ a
Python object -- it's just a raw C struct, and its lifetime
isn't controlled by refcounts.

----------------------------------------------------------------------

Comment By: Mike Klaas (mklaas)
Date: 2006-10-19 02:12

Message:
Logged In: YES 
user_id=1611720

Despite Tim's reassurrance, I'm afraid that Martin's patch
does infact prevent the segfault.  Sounds like it also
introduces a memleak.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2006-10-18 23:57

Message:
Logged In: YES 
user_id=31435

> Can anybody tell why gi_frame *isn't* incref'ed when
> the generator is created?

As documented (in concrete.tex), PyGen_New(f) steals a
reference to the frame passed to it.  Its only call site
(well, in the core) is in ceval.c, which returns immediately
after PyGen_New takes over ownership of the frame the caller
created:

"""
/* Create a new generator that owns the ready to run frame
 * and return that as the value. */
return PyGen_New(f);
"""

In short, that PyGen_New() doesn't incref the frame passed
to it is intentional.

It's possible that the intent is flawed ;-), but offhand I
don't see how.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2006-10-18 23:05

Message:
Logged In: YES 
user_id=21627

Can you please review/try attached patch? Can anybody tell
why gi_frame *isn't* incref'ed when the generator is created?

----------------------------------------------------------------------

Comment By: Mike Klaas (mklaas)
Date: 2006-10-18 21:47

Message:
Logged In: YES 
user_id=1611720

I cannot yet produce an only-python script which reproduces
the problem, but I can give an overview.  There is a
generator running in one thread, an exception being raised
in another thread, and as a consequent, the generator in the
first thread is garbage-collected (triggering an exception
due to the new generator cleanup).  The problem is extremely
sensitive to timing--often the insertion/removal of print
statements, or reordering the code, causes the problem to
vanish, which is confounding my ability to create a simple
test script.

def getdocs():
    def f():
        <some somehwat time-consuming operation>
    while True:
        f()
        yield None

#
-----------------------------------------------------------------------------

class B(object):
    def __init__(self,):
        pass
    def doit(self):
        # must be an instance var to trigger segfault
        self.docIter = getdocs()
        print self.docIter # this is the generator
referred-to in the traceback
        for i, item in enumerate(self.docIter):            
            if i > 9:
                break            
        print 'exiting generator'


class A(object):
    """ Process entry point / main thread """
    def __init__(self):
  
        while True:
            try:
                self.func()
            except Exception, e:
                print 'right after raise'

  
    def func(self):        
        b = B()
        thread = threading.Thread(target=b.doit)
        thread.start()
        start_t = time.time()
        while True:
            try:
                if time.time() - start_t > 1:
                    raise Exception
            except Exception:
                print 'right before raise'
                # SIGSEGV here.  If this is changed to
                # 'break', no segfault occurs
                raise


if __name__ == '__main__':
    A()


----------------------------------------------------------------------

Comment By: Mike Klaas (mklaas)
Date: 2006-10-18 21:37

Message:
Logged In: YES 
user_id=1611720

I've produced a simplified traceback with a single generator
.  Note the frame being used in the traceback (#0) is the
same frame being dealloc'd (#11).

The relevant call in traceback.c is:
PyTraceBack_Here(PyFrameObject *frame)
{
        PyThreadState *tstate = frame->f_tstate;
        PyTracebackObject *oldtb = (PyTracebackObject *)
tstate->curexc_traceback;
        PyTracebackObject *tb = newtracebackobject(oldtb,
frame);

and I can verify that oldtb contains garbage:
(gdb) print frame
$1 = (PyFrameObject *) 0x8964d94
(gdb) print frame->f_tstate
$2 = (PyThreadState *) 0x895b178
(gdb) print $2->curexc_traceback
$3 = (PyObject *) 0x66



#0  0x080e4296 in PyTraceBack_Here (frame=0x8964d94) at
Python/traceback.c:94
#1  0x080b9ab7 in PyEval_EvalFrameEx (f=0x8964d94,
throwflag=1) at Python/ceval.c:2459
#2  0x08101a40 in gen_send_ex (gen=0xb7cca4ac,
arg=0x81333e0, exc=1) at Objects/genobject.c:82
#3  0x08101c0f in gen_close (gen=0xb7cca4ac, args=0x0) at
Objects/genobject.c:128
#4  0x08101cde in gen_del (self=0xb7cca4ac) at
Objects/genobject.c:163
#5  0x0810195b in gen_dealloc (gen=0xb7cca4ac) at
Objects/genobject.c:31
#6  0x080815b9 in dict_dealloc (mp=0xb7cc913c) at
Objects/dictobject.c:801
#7  0x080927b2 in subtype_dealloc (self=0xb7cca76c) at
Objects/typeobject.c:686
#8  0x0806028d in instancemethod_dealloc (im=0xb7d07f04) at
Objects/classobject.c:2285
#9  0x080815b9 in dict_dealloc (mp=0xb7cc90b4) at
Objects/dictobject.c:801
#10 0x080927b2 in subtype_dealloc (self=0xb7cca86c) at
Objects/typeobject.c:686
#11 0x081028c5 in frame_dealloc (f=0x8964a94) at
Objects/frameobject.c:416
#12 0x080e41b1 in tb_dealloc (tb=0xb7cc1fcc) at
Python/traceback.c:34
#13 0x080e41c2 in tb_dealloc (tb=0xb7cc1f7c) at
Python/traceback.c:33
#14 0x08080dca in insertdict (mp=0xb7f99824, key=0xb7ccd020,
hash=1492466088, value=0xb7ccd054)
    at Objects/dictobject.c:394
#15 0x080811a4 in PyDict_SetItem (op=0xb7f99824,
key=0xb7ccd020, value=0xb7ccd054)
    at Objects/dictobject.c:619
#16 0x08082dc6 in PyDict_SetItemString (v=0xb7f99824,
key=0x8129284 "exc_traceback", 
    item=0xb7ccd054) at Objects/dictobject.c:2103
#17 0x080e2837 in PySys_SetObject (name=0x8129284
"exc_traceback", v=0xb7ccd054)
    at Python/sysmodule.c:82
#18 0x080bc9e5 in PyEval_EvalFrameEx (f=0x895f934,
throwflag=0) at Python/ceval.c:2954
---Type <return> to continue, or q <return> to quit---
#19 0x080bfda3 in PyEval_EvalCodeEx (co=0xb7f6ade8,
globals=0xb7fafa44, locals=0x0, 
    args=0xb7cc5ff8, argcount=1, kws=0x0, kwcount=0,
defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:2833
#20 0x08104083 in function_call (func=0xb7cc7294,
arg=0xb7cc5fec, kw=0x0)
    at Objects/funcobject.c:517
#21 0x0805a660 in PyObject_Call (func=0xb7cc7294,
arg=0xb7cc5fec, kw=0x0)
    at Objects/abstract.c:1860


----------------------------------------------------------------------

Comment By: Mike Klaas (mklaas)
Date: 2006-10-18 04:23

Message:
Logged In: YES 
user_id=1611720

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1208400192 (LWP 26235)]
0x080e4296 in PyTraceBack_Here (frame=0x9c2d7b4) at
Python/traceback.c:94
94              if ((next != NULL &&
!PyTraceBack_Check(next)) ||
(gdb) bt
#0  0x080e4296 in PyTraceBack_Here (frame=0x9c2d7b4) at
Python/traceback.c:94
#1  0x080b9ab7 in PyEval_EvalFrameEx (f=0x9c2d7b4,
throwflag=1) at Python/ceval.c:2459
#2  0x08101a40 in gen_send_ex (gen=0xb64f880c,
arg=0x81333e0, exc=1) at Objects/genobject.c:82
#3  0x08101c0f in gen_close (gen=0xb64f880c, args=0x0) at
Objects/genobject.c:128
#4  0x08101cde in gen_del (self=0xb64f880c) at
Objects/genobject.c:163
#5  0x0810195b in gen_dealloc (gen=0xb64f880c) at
Objects/genobject.c:31
#6  0x080b9912 in PyEval_EvalFrameEx (f=0x9c2802c,
throwflag=1) at Python/ceval.c:2491
#7  0x08101a40 in gen_send_ex (gen=0xb64f362c,
arg=0x81333e0, exc=1) at Objects/genobject.c:82
#8  0x08101c0f in gen_close (gen=0xb64f362c, args=0x0) at
Objects/genobject.c:128
#9  0x08101cde in gen_del (self=0xb64f362c) at
Objects/genobject.c:163
#10 0x0810195b in gen_dealloc (gen=0xb64f362c) at
Objects/genobject.c:31
#11 0x080815b9 in dict_dealloc (mp=0xb64f4a44) at
Objects/dictobject.c:801
#12 0x080927b2 in subtype_dealloc (self=0xb64f340c) at
Objects/typeobject.c:686
#13 0x0806028d in instancemethod_dealloc (im=0xb796a0cc) at
Objects/classobject.c:2285
#14 0x080815b9 in dict_dealloc (mp=0xb64f78ac) at
Objects/dictobject.c:801
#15 0x080927b2 in subtype_dealloc (self=0xb64f810c) at
Objects/typeobject.c:686
#16 0x081028c5 in frame_dealloc (f=0x9c272bc) at
Objects/frameobject.c:416
#17 0x080e41b1 in tb_dealloc (tb=0xb799166c) at
Python/traceback.c:34
#18 0x080e41c2 in tb_dealloc (tb=0xb4071284) at
Python/traceback.c:33
#19 0x080e41c2 in tb_dealloc (tb=0xb7991824) at
Python/traceback.c:33
#20 0x08080dca in insertdict (mp=0xb7f56824, key=0xb3fb9930,
hash=1492466088, value=0xb3fb9914)
    at Objects/dictobject.c:394
#21 0x080811a4 in PyDict_SetItem (op=0xb7f56824,
key=0xb3fb9930, value=0xb3fb9914) at Objects/dictobject.c:619
#22 0x08082dc6 in PyDict_SetItemString (v=0xb7f56824,
key=0x8129284 "exc_traceback", item=0xb3fb9914)
    at Objects/dictobject.c:2103
#23 0x080e2837 in PySys_SetObject (name=0x8129284
"exc_traceback", v=0xb3fb9914) at Python/sysmodule.c:82
#24 0x080bc9e5 in PyEval_EvalFrameEx (f=0x9c10e7c,
throwflag=0) at Python/ceval.c:2954
#25 0x080bfda3 in PyEval_EvalCodeEx (co=0xb7bbc890,
globals=0xb7bbe57c, locals=0x0, args=0x9b8e2ac, argcount=1,
    kws=0x9b8e2b0, kwcount=0, defs=0xb7b7aed8, defcount=1,
closure=0x0) at Python/ceval.c:2833
#26 0x080bd62a in PyEval_EvalFrameEx (f=0x9b8e16c,
throwflag=0) at Python/ceval.c:3662
#27 0x080bfda3 in PyEval_EvalCodeEx (co=0xb7bbc848,
globals=0xb7bbe57c, locals=0x0, args=0xb7af9d58, argcount=1,
    kws=0x9b7a818, kwcount=0, defs=0x0, defcount=0,
closure=0x0) at Python/ceval.c:2833
#28 0x08104083 in function_call (func=0xb7b79c34,
arg=0xb7af9d4c, kw=0xb7962c64) at Objects/funcobject.c:517
#29 0x0805a660 in PyObject_Call (func=0xb7b79c34,
arg=0xb7af9d4c, kw=0xb7962c64) at Objects/abstract.c:1860
#30 0x080bcb4b in PyEval_EvalFrameEx (f=0x9b82c0c,
throwflag=0) at Python/ceval.c:3846
#31 0x080bfda3 in PyEval_EvalCodeEx (co=0xb7cd6608,
globals=0xb7cd4934, locals=0x0, args=0x9b7765c, argcount=2,
    kws=0x9b77664, kwcount=0, defs=0x0, defcount=0,
closure=0xb7cfe874) at Python/ceval.c:2833
#32 0x080bd62a in PyEval_EvalFrameEx (f=0x9b7751c,
throwflag=0) at Python/ceval.c:3662
#33 0x080bdf70 in PyEval_EvalFrameEx (f=0x9a9646c,
throwflag=0) at Python/ceval.c:3652
#34 0x080bfda3 in PyEval_EvalCodeEx (co=0xb7f39728,
globals=0xb7f6ca44, locals=0x0, args=0x9b7a00c, argcount=0,
    kws=0x9b7a00c, kwcount=0, defs=0x0, defcount=0,
closure=0xb796410c) at Python/ceval.c:2833
#35 0x080bd62a in PyEval_EvalFrameEx (f=0x9b79ebc,
throwflag=0) at Python/ceval.c:3662
#36 0x080bfda3 in PyEval_EvalCodeEx (co=0xb7f39770,
globals=0xb7f6ca44, locals=0x0, args=0x99086c0, argcount=0,
    kws=0x99086c0, kwcount=0, defs=0x0, defcount=0,
closure=0x0) at Python/ceval.c:2833
#37 0x080bd62a in PyEval_EvalFrameEx (f=0x9908584,
throwflag=0) at Python/ceval.c:3662
#38 0x080bfda3 in PyEval_EvalCodeEx (co=0xb7f397b8,
globals=0xb7f6ca44, locals=0xb7f6ca44, args=0x0, argcount=0,
    kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
at Python/ceval.c:2833
---Type <return> to continue, or q <return> to quit---
#39 0x080bff32 in PyEval_EvalCode (co=0xb7f397b8,
globals=0xb7f6ca44, locals=0xb7f6ca44) at Python/ceval.c:494
#40 0x080ddff1 in PyRun_FileExFlags (fp=0x98a4008,
filename=0xbfffd4a3 "scoreserver.py", start=257,
    globals=0xb7f6ca44, locals=0xb7f6ca44, closeit=1,
flags=0xbfffd298) at Python/pythonrun.c:1264
#41 0x080de321 in PyRun_SimpleFileExFlags (fp=Variable "fp"
is not available.
) at Python/pythonrun.c:870
#42 0x08056ac4 in Py_Main (argc=1, argv=0xbfffd334) at
Modules/main.c:496
#43 0x00a69d5f in __libc_start_main () from /lib/libc.so.6
#44 0x08056051 in _start ()



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1579370&group_id=5470


More information about the Python-bugs-list mailing list