Need help extending a NumPy array in C
![](https://secure.gravatar.com/avatar/51040b16bc3fe1e094f14bd3a2d52aec.jpg?s=120&d=mm&r=g)
Hello! I am maintaining a C++ codebase with extensive ties to Python bindings (via SWIG). One of the features of the code is that we define (in C) a subclass of a NumPy Array. Everything worked until we started getting this message with numpy 1.23: RuntimeError: Object of type <class 'NamedArray'> appears to be C subclassed NumPy array, void scalar, or allocated in a non-standard way.NumPy reserves the right to change the size of these structures. Projects are required to take this into account by either recompiling against a specific NumPy version or padding the struct and enforcing a maximum NumPy version. My problem is that I don't know how do to either of those things. I would have assumed that whatever I compiled against, it would always be compiled against a specific NumPy version, and I also assumed that 'enforcing a maximum NumPy version' would happen in requirements.txt for the Python package, but that also seems to not be the case. Any hints? Thank you! -Lucian Smith
![](https://secure.gravatar.com/avatar/72f994ca072df3a3d2c3db8a137790fd.jpg?s=120&d=mm&r=g)
On 8/7/22 01:37, lpsmith@uw.edu wrote:
Hello! I am maintaining a C++ codebase with extensive ties to Python bindings (via SWIG). One of the features of the code is that we define (in C) a subclass of a NumPy Array. Everything worked until we started getting this message with numpy 1.23:
RuntimeError: Object of type <class 'NamedArray'> appears to be C subclassed NumPy array, void scalar, or allocated in a non-standard way.NumPy reserves the right to change the size of these structures. Projects are required to take this into account by either recompiling against a specific NumPy version or padding the struct and enforcing a maximum NumPy version.
My problem is that I don't know how do to either of those things. I would have assumed that whatever I compiled against, it would always be compiled against a specific NumPy version, and I also assumed that 'enforcing a maximum NumPy version' would happen in requirements.txt for the Python package, but that also seems to not be the case. Any hints? Thank you!
-Lucian Smith
Did you rerun SWIG to regenerate the bindings before the 1.23 build? Matti
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Thu, 2022-07-07 at 22:37 +0000, lpsmith@uw.edu wrote:
Hello! I am maintaining a C++ codebase with extensive ties to Python bindings (via SWIG). One of the features of the code is that we define (in C) a subclass of a NumPy Array. Everything worked until we started getting this message with numpy 1.23:
RuntimeError: Object of type <class 'NamedArray'> appears to be C subclassed NumPy array, void scalar, or allocated in a non-standard way.NumPy reserves the right to change the size of these structures. Projects are required to take this into account by either recompiling against a specific NumPy version or padding the struct and enforcing a maximum NumPy version.
My problem is that I don't know how do to either of those things. I would have assumed that whatever I compiled against, it would always be compiled against a specific NumPy version, and I also assumed that 'enforcing a maximum NumPy version' would happen in requirements.txt for the Python package, but that also seems to not be the case. Any hints? Thank you!
This should only happen if you compile against NumPy 1.19 and then run your code on NumPy 1.20+. (The warning is quite a lot older than 1.23 so I am surprised about you mentioning 1.23 specifically.) Of course you also have to compile with the oldest NumPy version you wish to support, so that would be a normal thing. If you don't find the pattern for extending the `PyArrayObject` struct maybe you can share a bit of the code (we can do that off-list if necessary). In principle the warning could be triggered by any bad memory access, but it seems unlikely... The normal reason for this (or what the warning tries to inform you about), is that somewhere in the code you should have the following: typedef struct { PyArrayObject; /* some information you add: */ char *names; } NamedArrayObject; PyTypeObject NamedArray_Type = { .tp_basicsize = sizeof(NamedArrayObject); }; A quick hack would be just to add `void *numpy_reserved[2];` after the `PyArrayObject`, maybe as: typedef struct { void *reserved[2]; } numpy_array_padding; typedef struct { PyArrayObject; numpy_array_padding; /* space for newer NumPy */ char *names; } NamedArrayObject; Then you can add a runtime check at module initialization: sizeof(PyArrayObject) + sizeof(numpy_array+padding) >= PyArray_Type.tp_basicsize To raise an error if a similar incompatibility arises in the future. (I say `void *reserved[2]` because we appended a single `void *` twice.) That padding could be done at run-time as well, but that is a bit trickier and maybe more hassle than worthwhile. Hope this helps fixing it quickly! Cheers, Sebastian
-Lucian Smith _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: sebastian@sipsolutions.net
![](https://secure.gravatar.com/avatar/51040b16bc3fe1e094f14bd3a2d52aec.jpg?s=120&d=mm&r=g)
Thanks for the information! I've had to work on other project in the meantime, but was able to get back to this again. In an effort to wrap my head around the project's code, I realized that I did not have a line like: #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION in it. So, I added the line, fixed the errors the resulted, and recompiled. And immediately got segmentation faults. Investigating, I discovered that the problem was that the pointers to my new member variables change mysteriously to invalid values. The basic flow is: 1) I call PyArray_New with a pointer to my new NamedArrayObject subclass. 2) It does stuff 3) It calls the NamedArrayObject_alloc function we wrote, which creates the new 'rowNames' PyList. 4) It does more stuff 5) It calls the NamedArrayObject_Finalize function we wrote. The 'rowNames' object now has a different, and incorrect, pointer, and at this point if I ever call PyList_Size(self->rowNames), I get a segmentation fault. If, however, I re-comment out the '#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" line, everything works fine! Part 4 above doesn't change the 'rowNames' pointer to anything, and it remains valid for the rest of the program. (If I change 1_7_ to 1_23_, I get identical behavior.) Normally, I'd be happy to just say 'well, I guess I have to use the deprecated API', but this issue combined with the fact that I'm getting the "C subclassed NumPy array, void scalar, or allocated in a non-standard way" error message, makes me think that something has changed in a fundamental way since the code was first written in 2014, and that I should do things the new way. I just... don't know what that new way might be ;-) The upshot is that I have two questions: 1) Does this look like a bug I should file? 2) What areas should I start looking into to change this old code to work with modern NumPy? and possibly: 3) Is there a worked example of a C-extended NumPy array somewhere I could steal? The full code is at https://github.com/sys-bio/roadrunner/blob/develop/wrappers/Python/roadrunne... if that helps anyone. The branch where I'm trying to change things is at https://github.com/sys-bio/roadrunner/blob/update-numpy/wrappers/Python/road... which has a bunch of print statements added, since I was working without a debugger on Windows. The relevant output for a simple Python script that calls this was (for the broken #define version): ``` NA_N 0 Debug: PyObject* rr::NamedArrayObject_alloc(PyTypeObject*, Py_ssize_t) rownames new ref 0x7ffff7362180 rownames added to object 0x7ffff747e750 rownames size 0 rownames ref 0x7ffff7362180 rownames size 0 Debug: namedArrayObject allocated: 0x7ffff747e750 Debug: namedArrayObject returned obj: 0x7ffff747e750 Debug: Done Debug: PyObject* rr::NamedArrayObject_Finalize(rr::NamedArrayObject*, PyObject*) rownames ref 0x555555f4f830 Debug: finalizing object self: 0x7ffff747e750; args 0x555555aca3e0 rownames ref 0x555555f4f830 Debug: NamedArrayObject initialized from constructor. 'None' path taken Debug: PyObject* rr::NamedArrayObject_Finalize_FromConstructor(rr::NamedArrayObject*) rownames ref 0x555555f4f830 rownames ref 0x555555f4f830 Debug: Done ``` and for the working non-#define version: ``` NA_N 0 Debug: PyObject* rr::NamedArrayObject_alloc(PyTypeObject*, Py_ssize_t) rownames new ref 0x7ffff7362140 rownames added to object 0x7ffff752e1e0 rownames size 0 rownames ref 0x7ffff7362140 rownames size 0 Debug: namedArrayObject allocated: 0x7ffff752e1e0 Debug: namedArrayObject returned obj: 0x7ffff752e1e0 Debug: Done Debug: PyObject* rr::NamedArrayObject_Finalize(rr::NamedArrayObject*, PyObject*) rownames ref 0x7ffff7362140 Debug: finalizing object self: 0x7ffff752e1e0; args 0x555555aca3e0 rownames ref 0x7ffff7362140 Debug: NamedArrayObject initialized from constructor. 'None' path taken Debug: PyObject* rr::NamedArrayObject_Finalize_FromConstructor(rr::NamedArrayObject*) rownames ref 0x7ffff7362140 rownames ref 0x7ffff7362140 Debug: Done ``` Thanks for bearing with me though this long message! And particular thanks to Sebastien for answering my initial question, which I hope to be able to actually address again soon ;-) -Lucian
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Fri, 2022-08-19 at 23:56 +0000, lpsmith@uw.edu wrote:
Thanks for the information! I've had to work on other project in the meantime, but was able to get back to this again.
In an effort to wrap my head around the project's code, I realized that I did not have a line like:
#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
in it. So, I added the line, fixed the errors the resulted, and recompiled. And immediately got segmentation faults.
With it, the fields are hidden completely which is the intention. But that also means the size is wrong for subclassing. You would have to use `PyArrayObject_fields` although that basically circumvents the deprecation, it somehwat makes sense, you should just only use it in that one place I guess (not for actual access to strides). Overall, I am not sure if this will ever help us much, but the solution seems simple here. There should be no fundamental changes with the exception of the size of `PyArrayObject_fields`. - Sebastian
Investigating, I discovered that the problem was that the pointers to my new member variables change mysteriously to invalid values. The basic flow is:
1) I call PyArray_New with a pointer to my new NamedArrayObject subclass. 2) It does stuff 3) It calls the NamedArrayObject_alloc function we wrote, which creates the new 'rowNames' PyList. 4) It does more stuff 5) It calls the NamedArrayObject_Finalize function we wrote. The 'rowNames' object now has a different, and incorrect, pointer, and at this point if I ever call PyList_Size(self->rowNames), I get a segmentation fault.
If, however, I re-comment out the '#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" line, everything works fine! Part 4 above doesn't change the 'rowNames' pointer to anything, and it remains valid for the rest of the program.
(If I change 1_7_ to 1_23_, I get identical behavior.)
Normally, I'd be happy to just say 'well, I guess I have to use the deprecated API', but this issue combined with the fact that I'm getting the "C subclassed NumPy array, void scalar, or allocated in a non-standard way" error message, makes me think that something has changed in a fundamental way since the code was first written in 2014, and that I should do things the new way. I just... don't know what that new way might be ;-)
The upshot is that I have two questions: 1) Does this look like a bug I should file? 2) What areas should I start looking into to change this old code to work with modern NumPy?
and possibly:
3) Is there a worked example of a C-extended NumPy array somewhere I could steal?
The full code is at
https://github.com/sys-bio/roadrunner/blob/develop/wrappers/Python/roadrunne...
if that helps anyone. The branch where I'm trying to change things is at
https://github.com/sys-bio/roadrunner/blob/update-numpy/wrappers/Python/road...
which has a bunch of print statements added, since I was working without a debugger on Windows. The relevant output for a simple Python script that calls this was (for the broken #define version): ``` NA_N 0 Debug: PyObject* rr::NamedArrayObject_alloc(PyTypeObject*, Py_ssize_t) rownames new ref 0x7ffff7362180 rownames added to object 0x7ffff747e750 rownames size 0 rownames ref 0x7ffff7362180 rownames size 0 Debug: namedArrayObject allocated: 0x7ffff747e750 Debug: namedArrayObject returned obj: 0x7ffff747e750 Debug: Done
Debug: PyObject* rr::NamedArrayObject_Finalize(rr::NamedArrayObject*, PyObject*) rownames ref 0x555555f4f830 Debug: finalizing object self: 0x7ffff747e750; args 0x555555aca3e0 rownames ref 0x555555f4f830 Debug: NamedArrayObject initialized from constructor. 'None' path taken Debug: PyObject* rr::NamedArrayObject_Finalize_FromConstructor(rr::NamedArrayObject*) rownames ref 0x555555f4f830 rownames ref 0x555555f4f830 Debug: Done ```
and for the working non-#define version:
``` NA_N 0 Debug: PyObject* rr::NamedArrayObject_alloc(PyTypeObject*, Py_ssize_t) rownames new ref 0x7ffff7362140 rownames added to object 0x7ffff752e1e0 rownames size 0 rownames ref 0x7ffff7362140 rownames size 0 Debug: namedArrayObject allocated: 0x7ffff752e1e0 Debug: namedArrayObject returned obj: 0x7ffff752e1e0 Debug: Done
Debug: PyObject* rr::NamedArrayObject_Finalize(rr::NamedArrayObject*, PyObject*) rownames ref 0x7ffff7362140 Debug: finalizing object self: 0x7ffff752e1e0; args 0x555555aca3e0 rownames ref 0x7ffff7362140 Debug: NamedArrayObject initialized from constructor. 'None' path taken Debug: PyObject* rr::NamedArrayObject_Finalize_FromConstructor(rr::NamedArrayObject*) rownames ref 0x7ffff7362140 rownames ref 0x7ffff7362140 Debug: Done ```
Thanks for bearing with me though this long message! And particular thanks to Sebastien for answering my initial question, which I hope to be able to actually address again soon ;-)
-Lucian _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: sebastian@sipsolutions.net
![](https://secure.gravatar.com/avatar/51040b16bc3fe1e094f14bd3a2d52aec.jpg?s=120&d=mm&r=g)
Sebastian Berg wrote:
On Fri, 2022-08-19 at 23:56 +0000, lpsmith@uw.edu wrote:
Thanks for the information! I've had to work on other project in the meantime, but was able to get back to this again. In an effort to wrap my head around the project's code, I realized that I did not have a line like: #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION in it. So, I added the line, fixed the errors the resulted, and recompiled. And immediately got segmentation faults.
With it, the fields are hidden completely which is the intention. But that also means the size is wrong for subclassing. You would have to use `PyArrayObject_fields` although that basically circumvents the deprecation, it somehwat makes sense, you should just only use it in that one place I guess (not for actual access to strides). Overall, I am not sure if this will ever help us much, but the solution seems simple here. There should be no fundamental changes with the exception of the size of `PyArrayObject_fields`.
That does clear up some things, but it also confuses me in other ways. The fields in question are fields I've added myself as part of the subclass. Are you saying that if I add new fields, those fields are hidden completely? I don't see how I could interact with them at all if that's the case, so I must be misunderstanding something. Among other things, the printing routines need access to them so they can print labels in addition to the values. So it's not just 'checking the size' that's the issue; I also need to be able to set, modify, and read out their values. The core fields could remain hidden (accessible through the normal routines for the class) but the new fields surely wouldn't be? -Lucian
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Sat, 2022-09-03 at 00:03 +0000, lpsmith@uw.edu wrote:
Sebastian Berg wrote:
On Fri, 2022-08-19 at 23:56 +0000, lpsmith@uw.edu wrote:
Thanks for the information! I've had to work on other project in the meantime, but was able to get back to this again. In an effort to wrap my head around the project's code, I realized that I did not have a line like: #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION in it. So, I added the line, fixed the errors the resulted, and recompiled. And immediately got segmentation faults.
With it, the fields are hidden completely which is the intention. But that also means the size is wrong for subclassing. You would have to use `PyArrayObject_fields` although that basically circumvents the deprecation, it somehwat makes sense, you should just only use it in that one place I guess (not for actual access to strides). Overall, I am not sure if this will ever help us much, but the solution seems simple here. There should be no fundamental changes with the exception of the size of `PyArrayObject_fields`.
That does clear up some things, but it also confuses me in other ways. The fields in question are fields I've added myself as part of the subclass. Are you saying that if I add new fields, those fields are hidden completely? I don't see how I could interact with them at all if that's the case, so I must be misunderstanding something. Among other things, the printing routines need access to them so they can print labels in addition to the values. So it's not just 'checking the size' that's the issue; I also need to be able to set, modify, and read out their values. The core fields could remain hidden (accessible through the normal routines for the class) but the new fields surely wouldn't be?
That is exactly how it is. You have the following: struct MyArray { /* Not PyArrayObject with "new" API: */ PyArrayObject_fields reserved_for_numpy, void *myfield1, int myfield2, }; In that setup, you should actually never access `reserved_for_numpy`. You should rather simply cast `MyArray` to `PyArrayObject` if/when necessary (or just `PyObject`). If you need to access certain fields like the shape, you would use `PyArray_DIMS` rather than accessing `->dims` directly. The reason is that in theory at least we want to be able change what is inside `PyArrayObject_fields`. It should be considered opaque! Unfortunately, you still need the size of `PyArrayObject_fields` even if you never access it (to define your struct/class). Now the problem is, NumPy may change the size of `PyArrayObject_fields` to allow significant improvements for the vast majority of users who do not subclass in C. One solution when this happens is to recompile. But it is not future- proof (not compatible with future versions of NumPy unless you recompile for it). The future proof version would be to add code something like: struct numpy_space { PyArrayObject_fields reserved_for_numpy, void *future_space_reserved_for_numpy[2], } struct MyArray { numpy_space reserved_for_numpy, void *myfield1, int myfield2, }; Which just adds a bit of unused padding space NumPy could use in the future. Now, in principle of course that wastes a bit of space... Also NumPy could theoretically grow beyond the size of `numpy_space`. So a solution might be to add a check somewhere during init: if (PyArrayType.tp_basicsize > sizeof(numpy_space)) { PyErr_SetString(PyExc_RuntimeError, "NumPy extended its struct a recompilation " "seems necessary..."); return NULL; } In principle, there are solutions to do this more dynamically (checking `tp_basicsize` at runtime), but I am not sure it is practical or even works well in C++. You would need to store the size as an offset and always use that offset to access all of your custom fields. I hope that clarifies things. Cheers, Sebastian
-Lucian _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: sebastian@sipsolutions.net
participants (3)
-
lpsmith@uw.edu
-
Matti Picus
-
Sebastian Berg