
It seems this could be a deadlock somewhere ? The line 835 is where threads are allowed. I had another problem today with the same instruction ((x-self.loc) * (1/self.scale)), the computation just hung. I'm trying to test without the thread lock. Matthieu 2007/10/19, Matthieu Brucher <matthieu.brucher@gmail.com>:
Here is an excerpt of the stack on the numpy svn of wednesday :
#0 0x40000402 in __kernel_vsyscall () #1 0x00b8e382 in sem_post@GLIBC_2.0 () from /lib/libpthread.so.0 #2 0x080fe5b7 in PyThread_release_lock (lock=0x80d0fb2) at Python/thread_pthread.h:374 #3 0x080d0fb2 in PyEval_SaveThread () at Python/ceval.c:299 #4 0x4064ec7a in dotblas_innerproduct (dummy=0x0, args=0x960522c) at numpy/core/blasdot/_dotblas.c:835 #5 0x0811ecf1 in PyCFunction_Call (func=0x960522c, arg=0x80cfec7, kw=0x942068c) at Objects/methodobject.c:73 #6 0x080cfec7 in call_function (pp_stack=0x0, oparg=1) at Python/ceval.c:3564 #7 0x080cb19a in PyEval_EvalFrameEx. () at Python/ceval.c:2267 #8 0x080ca848 in PyEval_EvalCodeEx. () at Python/ceval.c:2831 #9 0x0811de5b in function_call (func=0x0, arg=0x960516c, kw=0x965b148) at Objects/funcobject.c:517 #10 0x0805bdb6 in PyObject_Call (func=0x0, arg=0x960516c, kw=0x8061a66) at Objects/abstract.c:1860 #11 0x08061a66 in instancemethod_call (func=0x95f16ac, arg=0x94f5824, kw=0x805bdb6) at Objects/classobject.c:2497 #12 0x0805bdb6 in PyObject_Call (func=0x0, arg=0x95f16ac, kw=0x80a86e9) at Objects/abstract.c:1860 #13 0x080a86e9 in slot_tp_call (self=0xbf975c78, args=0x0, kwds=0x805bdb6) at Objects/typeobject.c:4633 #14 0x0805bdb6 in PyObject_Call (func=0x0, arg=0x95f16ac, kw=0x80d017b) at Objects/abstract.c:1860 #15 0x080d017b in do_call (func=0x0, pp_stack=0x1, na=1, nk=154955456) at Python/ceval.c:3775 #16 0x080cfd56 in call_function (pp_stack=0x0, oparg=1) at Python/ceval.c:3587 #17 0x080cb19a in PyEval_EvalFrameEx. () at Python/ceval.c:2267 #18 0x080ca848 in PyEval_EvalCodeEx. () at Python/ceval.c:2831 #19 0x0811de5b in function_call (func=0x0, arg=0x960520c, kw=0x30) at Objects/funcobject.c:517 #20 0x0805bdb6 in PyObject_Call (func=0x0, arg=0x960520c, kw=0x8061a66) at Objects/abstract.c:1860 #21 0x08061a66 in instancemethod_call (func=0x95ecbac, arg=0x94f57d4, kw=0x805bdb6) at Objects/classobject.c:2497 #22 0x0805bdb6 in PyObject_Call (func=0x0, arg=0x95ecbac, kw=0x80a86e9) at Objects/abstract.c:1860 #23 0x080a86e9 in slot_tp_call (self=0xbf97609c, args=0x0, kwds=0x805bdb6) at Objects/typeobject.c:4633 #24 0x0805bdb6 in PyObject_Call (func=0x0, arg=0x95ecbac, kw=0x80d017b) at Objects/abstract.c:1860 #25 0x080d017b in do_call (func=0x0, pp_stack=0x1, na=1, nk=156957888) at Python/ceval.c:3775 #26 0x080cfd56 in call_function (pp_stack=0x0, oparg=1) at Python/ceval.c:3587 #27 0x080cb19a in PyEval_EvalFrameEx. () at Python/ceval.c:2267 #28 0x080ca848 in PyEval_EvalCodeEx. () at Python/ceval.c:2831 #29 0x0811de5b in function_call (func=0x0, arg=0x960518c, kw=0x9548a1) at Objects/funcobject.c:517 #30 0x0805bdb6 in PyObject_Call (func=0x0, arg=0x960518c, kw=0x8061a66) at Objects/abstract.c:1860 #31 0x08061a66 in instancemethod_call (func=0x960526c, arg=0x95fc874, kw=0x805bdb6) at Objects/classobject.c:2497 #32 0x0805bdb6 in PyObject_Call (func=0x0, arg=0x960526c, kw=0x80a86e9) at Objects/abstract.c:1860 #33 0x080a86e9 in slot_tp_call (self=0xbf9764c0, args=0x0, kwds=0x805bdb6) at Objects/typeobject.c:4633 #34 0x0805bdb6 in PyObject_Call (func=0x0, arg=0x960526c, kw=0x80d017b) at Objects/abstract.c:1860 #35 0x080d017b in do_call (func=0x0, pp_stack=0x1, na=1, nk=157640136) at Python/ceval.c:3775 #36 0x080cfd56 in call_function (pp_stack=0x0, oparg=1) at Python/ceval.c:3587 #37 0x080cb19a in PyEval_EvalFrameEx. () at Python/ceval.c:2267 #38 0x080ca848 in PyEval_EvalCodeEx. () at Python/ceval.c:2831 #39 0x0811de5b in function_call (func=0x0, arg=0x95f260c, kw=0x0) at Objects/funcobject.c:517 #40 0x0805bdb6 in PyObject_Call (func=0x9609934, arg=0x95f260c, kw=0x8061a66) at Objects/abstract.c:1860
Seems that the bug could be somewhere in the handling of the dot blas module ?
Matthieu
2007/10/15, Travis E. Oliphant <oliphant@enthought.com >:
Matthieu Brucher wrote:
The problem is that there is a data-type reference counting error
some
where that is attempting to deallocate the built-in data-type 'l'
That's what I supposed, but I couldn't find the reason why it wanted to do this
It's not really a Python error but a logging. The code won't let
you
deallocate the built-ins, but it will tell you that something tried to.
Reference counting on data-types is easy to get wrong (particularly with Pyrex extension modules) because most calls consume a reference to
the
data-type (if they return an object that contains a reference to
the
data-type).
It is a bug, and it would be nice to figure it out, but that would require the code that caused it.
I've updated my numpy version to the latest svn, the behaviour seems to be different (more warnings), I'll try to give more information about the error, but giving the whole code will not be simple (it uses a big data file that seems to trigger the error as with other data files, the error didn't show up :()
There are two types of errors that can occur with reference counting on data-types.
1) There are too many DECREF's --- this gets us to the error quickly and
is usually easy to reproduce 2) There are too many INCREF's (the reference count keeps going up until the internal counter wraps around to 0 and deallocation is attempted) --- this error is harder to reproduce and usually takes a while before it happens in the code.
-Travis
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- French PhD student Website : http://miles.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92