Embedding Python crash on PyTuple_New
Arnaud Loonstra
arnaud at sphaero.org
Tue Nov 23 15:25:35 EST 2021
On 23-11-2021 18:31, MRAB wrote:
> On 2021-11-23 16:04, Arnaud Loonstra wrote:
>> On 23-11-2021 16:37, MRAB wrote:
>>> On 2021-11-23 15:17, MRAB wrote:
>>>> On 2021-11-23 14:44, Arnaud Loonstra wrote:
>>>>> On 23-11-2021 15:34, MRAB wrote:
>>>>>> On 2021-11-23 12:07, Arnaud Loonstra wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've got Python embedded successfully in a program up until now
>>>>>>> as I'm
>>>>>>> now running into weird GC related segfaults. I'm currently trying to
>>>>>>> debug this but my understanding of CPython limits me here.
>>>>>>>
>>>>>>> I'm creating a Tuple in C but it crashes on creating it after a
>>>>>>> while.
>>>>>>> It doesn't make sense which makes me wonder something else must be
>>>>>>> happening? Could be it just crashes here because the GC is
>>>>>>> cleaning up
>>>>>>> stuff completely unrelated to the allocation of the new tuple?
>>>>>>> How can I
>>>>>>> troubleshoot this?
>>>>>>>
>>>>>>> I've got CPython compiled with --with-valgrind --without-pymalloc
>>>>>>> --with-pydebug
>>>>>>>
>>>>>>> In C I'm creating a tuple with the following method:
>>>>>>>
>>>>>>> static PyObject *
>>>>>>> s_py_zosc_tuple(pythonactor_t *self, zosc_t *oscmsg)
>>>>>>> {
>>>>>>> assert(self);
>>>>>>> assert(oscmsg);
>>>>>>> char *format = zosc_format(oscmsg);
>>>>>>>
>>>>>>> PyObject *rettuple = PyTuple_New((Py_ssize_t)
>>>>>>> strlen(format) );
>>>>>>>
>>>>>>> It segfaults here (frame 16) after 320 times (consistently)
>>>>>>>
>>>>>>>
>>>>>>> 1 __GI_raise raise.c 49 0x7ffff72c4e71
>>>>>>> 2 __GI_abort abort.c 79 0x7ffff72ae536
>>>>>>> 3 fatal_error pylifecycle.c 2183 0x7ffff7d84b4f
>>>>>>> 4 Py_FatalError pylifecycle.c 2193 0x7ffff7d878b2
>>>>>>> 5 _PyObject_AssertFailed object.c 2200 0x7ffff7c93cf2
>>>>>>> 6 visit_decref gcmodule.c 378 0x7ffff7dadfd5
>>>>>>> 7 tupletraverse tupleobject.c 623 0x7ffff7ca3e81
>>>>>>> 8 subtract_refs gcmodule.c 406 0x7ffff7dad340
>>>>>>> 9 collect gcmodule.c 1054 0x7ffff7dae838
>>>>>>> 10 collect_with_callback gcmodule.c 1240 0x7ffff7daf17b
>>>>>>> 11 collect_generations gcmodule.c 1262 0x7ffff7daf3f6
>>>>>>> 12 _PyObject_GC_Alloc gcmodule.c 1977 0x7ffff7daf4f2
>>>>>>> 13 _PyObject_GC_Malloc gcmodule.c 1987 0x7ffff7dafebc
>>>>>>> 14 _PyObject_GC_NewVar gcmodule.c 2016 0x7ffff7daffa5
>>>>>>> 15 PyTuple_New tupleobject.c 118 0x7ffff7ca4da7
>>>>>>> 16 s_py_zosc_tuple pythonactor.c 366 0x55555568cc82
>>>>>>> 17 pythonactor_socket pythonactor.c 664 0x55555568dac7
>>>>>>> 18 pythonactor_handle_msg pythonactor.c 862 0x55555568e472
>>>>>>> 19 pythonactor_handler pythonactor.c 828 0x55555568e2e2
>>>>>>> 20 sphactor_actor_run sphactor_actor.c 855 0x5555558cb268
>>>>>>> ... <More>
>>>>>>>
>>>>>>> Any pointer really appreciated.
>>>>>>>
>>>>>> You're creating a tuple that'll have the same number of members as
>>>>>> the length of a string? That looks strange to me.
>>>>>>
>>>>>> How are you setting the tuple's members?
>>>>>
>>>>> It's from a serialisation format called OSC. The string describes the
>>>>> type of bytes, every character is a type.
>>>>>
>>>>> I'm creating the tuple as follows:
>>>>>
>>>>> PyObject *rettuple = PyTuple_New((Py_ssize_t) strlen(format) );
>>>>>
>>>>> Then I iterate the OSC message using the format string, (just showing
>>>>> handling an int (i))
>>>>>
>>>>> char type = '0';
>>>>> Py_ssize_t pos = 0;
>>>>> const void *data = zosc_first(oscmsg, &type);
>>>>> while(data)
>>>>> {
>>>>> switch (type)
>>>>> {
>>>>> case('i'):
>>>>> {
>>>>> int32_t val = 9;
>>>>> int rc = zosc_pop_int32(oscmsg, &val);
>>>>> assert(rc == 0);
>>>>> PyObject *o = PyLong_FromLong((long)val);
>>>>> assert( o );
>>>>> rc = PyTuple_SetItem(rettuple, pos, o);
>>>>> assert(rc == 0);
>>>>> break;
>>>>> }
>>>>>
>>>>> Full code is here:
>>>>>
>>>>> https://github.com/hku-ect/gazebosc/blob/822452dfa27201db274d37ce09e835d98fe500b2/Actors/pythonactor.c#L360
>>>>>
>>>>>
>>>> Looking at that code, you have:
>>>>
>>>> PyObject *o = Py_BuildValue("s#", str, 1);
>>>>
>>>> what I'd check is the type of the 1 that you're passing. Wouldn't the
>>>> compiler assume that it's an int?
>>>>
>>>> The format string tells the function to expect a Py_ssize_t, but how
>>>> would the compiler know that?
>>>>
>>> Looking at https://www.mankier.com/3/zosc, it says for 'T' and 'F'
>>> "(no value required)", but you're doing:
>>>
>>> int rc = zosc_pop_bool(oscmsg, &bl);
>>>
>>> If no value is required, is there a bool there to be popped?
>>
>> The value is only required to set a user provided boolean to the value
>> in the message. There's no boolean value encoded in the message, only
>> the T and F in the format string.
>>
>> With regards to the Py_BuildValue("s#", str, 1);, that's a valid point.
>> I'll fix that. However in the segfaults I'm testing that code is not
>> touched.
>
> You can use "C" as a format string for Py_BuildValue to convert a C int
> representing a character to a Python string.
>
>> I'm now testing different parts of the code to see if it runs stable.
>> I've found it runs stable if I do not process the returned tuple.
>>
>> PyObject *pReturn = PyObject_CallMethod(self->pyinstance,
>> "handleSocket", "sOsss",
>> oscaddress,
>> py_osctuple,
>> ev->type, ev->name, strdup(ev->uuid));
>> Py_XINCREF(pReturn);
>>
> Why the Py_XINCREF? PyObject_CallMethod returns a new reference. The
> Py_DECREF that you do later won't destroy the object because of that
> additional Py_XINCREF, so that's a memory leak.
>
>> https://github.com/hku-ect/gazebosc/blob/505b30c46bf3f78d188c3f575c80e294d3db7e5d/Actors/pythonactor.c#L673
>>
>>
>> and a bit further in the code I convert the Python tuple to an OSC
>> message:
>>
>> zosc_t *retosc = s_py_zosc(pAddress, pData);
>>
>> https://github.com/hku-ect/gazebosc/blob/505b30c46bf3f78d188c3f575c80e294d3db7e5d/Actors/pythonactor.c#L732
>>
>>
>> If I change that line to:
>>
>> zosc_t *retosc = zosc_create("/temp", "ii", 32, 64);
>>
>> It runs stable.
>>
>> I would turn my attention to s_py_zosc function but I'm not sure. Since
>> the errors are GC related it could caused anywhere?
>>
> Basically, yes, but I won't be surprised if it was due to too few
> INCREFs or too many DECREFs somewhere.
>
>> https://github.com/hku-ect/gazebosc/blob/505b30c46bf3f78d188c3f575c80e294d3db7e5d/Actors/pythonactor.c#L286
>>
>>
> Incidentally, in s_py_zosc_tuple, you're not doing "assert(rc == 0);"
> after "after zosc_pop_float" or "zosc_pop_double".
Thanks for those pointers! I think your intuition is right. I might have
found the bugger. In s_py_zosc I call Py_DECREF on pAddress and pData.
However they are acquired by PyTuple_GetItem which returns a borrowed
reference. I think pAddress and pData are then also 'decrefed' when the
pReturn tuple which contains pAddress and pData is 'decrefed'?
I'm testing it now but it's running stable for a while now.
Preparing a PR: https://github.com/hku-ect/gazebosc/pull/181
More information about the Python-list
mailing list