[Numpy-discussion] tiny patch + Playing with strings and my own array descr (PyArray_STRING, PyArray_OBJECT).

Wed Jun 21 12:15:20 EDT 2006

Le Mardi 20 Juin 2006 11:24, Travis Oliphant a écrit :
> Matthieu Perrot wrote:
> > hi,
> >
> > I need to handle strings shaped by a numpy array whose data own to a C
> > structure. There is several possible answers to this problem :
> >   1) use a numpy array of strings (PyArray_STRING) and so a (char *)
> > object in C. It works as is, but you need to define a maximum size to
> > your strings because your set of strings is contiguous in memory.
> >   2) use a numpy array of objects (PyArray_OBJECT), and wrap each «C
> > string» with a python object, using PyStringObject for example. Then our
> > problem is that there is as wrapper as data element and I believe data
> > can't be shared when your created PyStringObject using (char *) thanks to
> >   PyString_AsStringAndSize by example.
> >
> >
> > Now, I will expose a third way, which allow you to use no size-limited
> > strings (as in solution 1.) and don't create wrappers before you really
> > need it (on demand/access).
> >
> > First, for convenience, we will use in C, (char **) type to build an
> > array of string pointers (as it was suggested in solution 2). Now, the
> > game is to make it works with numpy API, and use it in python through a
> > python array. Basically, I want a very similar behabiour than arrays of
> > PyObject, where data are not contiguous, only their address are. So, the
> > idea is to create a new array descr based on PyArray_OBJECT and change
> > its getitem/setitem functions to deals with my own data.
> >
> > I exepected numpy to work with this convenient array descr, but it fails
> > because PyArray_Scalar (arrayobject.c) don't call descriptor getitem
> > function (in PyArray_OBJECT case) but call 2 lines which have been
> > copy/paste from the OBJECT_getitem function). Here my small patch is :
> > replace (arrayobject.c:983-984):
> >           Py_INCREF(*((PyObject **)data));
> >           return *((PyObject **)data);
> > by :
> >           return descr->f->getitem(data, base);
> >
> > I play a lot with my new numpy array after this change and noticed that a
> > lot of uses works :
>
> This is an interesting solution.  I was not considering it, though, and
> so I'm not surprised you have problems.  You can register new types but
> basing them off of PyArray_OBJECT can be problematic because of the
> special-casing that is done in several places to manage reference counting.
>
> You are supposed to register your own data-types and get your own
> typenumber.  Then you can define all the functions for the entries as
> you wish.
>
> Riding on the back of PyArray_OBJECT may work if you are clever, but it
> may fail mysteriously as well because of a reference count snafu.
>
> Thanks for the tests and bug-reports.  I have no problem changing the
> code as you suggest.
>
> -Travis

Thanks for applying my suggestions.

I think, you suggest this kind of declaration :
        PyArray_Descr   *descr = PyArray_DescrNewFromType(PyArray_VOID);
        descr->f->getitem = (PyArray_GetItemFunc *) my_getitem;
        descr->f->setitem = (PyArray_SetItemFunc *) my_setitem;
        descr->elsize = sizeof(char *);
        PyArray_RegisterDataType(descr);

Without the last line, you are right it works and it follows the C-API
way. But if I register this array descr, the typenumber is bigger than
what PyTypeNum_ISFLEXIBLE function considers to be a flexible type. So the
returned scalar object is badly-formed. Then, I get a segmentation fault 
later, because the created voidscalar has a null descr pointer.
-- 
Matthieu Perrot