[Cython] Problem with Py_buffer struct definition in Builtin.py

Fri Mar 2 10:45:02 CET 2012

Hi,

the builtin Py_buffer struct type is currently defined as follows:

"""
builtin_structs_table = [
    ('Py_buffer', 'Py_buffer',
     [("buf",        PyrexTypes.c_void_ptr_type),
      ("obj",        PyrexTypes.py_object_type),
      ("len",        PyrexTypes.c_py_ssize_t_type),
      ...
"""

I hadn't noticed this before, but when you actually use it in a
"__getbuffer__()" special method, you have to assign the buffer owner (i.e.
self) to the .obj field, which is currently defined as "object". Meaning,
Cython will do reference counting for it. That's fine for the self
reference being assigned to it. However, when "__getbuffer__()" gets
called, .obj is not assigned, i.e. it may be NULL, or it may contain
garbage, depending on the caller. So, Cython will generate broken code for
it when trying to DECREF the original value as part of the assignment.

"""
 *         buffer.obj = self             # <<<<<<<<<<<<<<
 */
  __Pyx_INCREF(((PyObject *)__pyx_v_self));
  __Pyx_GIVEREF(((PyObject *)__pyx_v_self));
  __Pyx_GOTREF(__pyx_v_buffer->obj);
  __Pyx_DECREF(__pyx_v_buffer->obj);           // please crash for me
  __pyx_v_buffer->obj = ((PyObject *)__pyx_v_self);
"""

I see a couple of ways to fix this:

1) Change the field to PyObject* instead of object (and finally make
PyObject a known type in Cython). Problem: may break existing code,
although that's currently broken anyway. Also makes it more annoying for
users who then have to do the manual cast+INCREF dance.

2) Special case this field and do not do any reference counting for its
current value. PyBuffer_Release() will properly do the XDECREF cleanup
anyway, and it's unlikely that user code will assign to it more than once.
Problem: very unexpected behaviour for users in case they ever *do*
multiple assignments.

3) Wrap the user provided "__getbuffer__()" special method in code that
always assigns "self" to the .obj field before entering the user code and
also clears it on error return. That would mean that it will always contain
a valid value, so that ref-counting will work as expected. Problem: this
complicates the implementation and may not be obvious for users (who don't
read the documentation). It also won't fix any code that uses Py_buffer
outside of "__getbuffer__()" or that calls "__releasebuffer__()" directly
without going through PyBuffer_Release(). However, this actually makes it
easier for users, who can then even skip setting the value in all but some
very rare cases and just care about the actual buffer setup in their
"__getbuffer__()" code.

I'm leaning towards 3). What do you think?

I also think that this is worth fixing for 0.16 if possible, because it
currently generates crash code from innocent and correct looking user code.

Stefan