Is it OK to extend the ndarray structure?
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
Hi all, just curious, has anyone reservations about extending the ndarray struct (and the void scalar one)? The reason is that, I am starting to dislike the way we handle the buffer interface. Due to issues with backward compatibility, we cannot use the "right" way to free the buffer information. Because of that, the way we solve it is by storing lists of pointers in a dictionary... To me this seems a bit complicating, and is annoying since it adds a dictionary lookup overhead to every single array deletion (and inserting for every buffer creation). Also, it looks a bit like a memory leak in some cases (although that probably only annoys me and only when running valgrind). It seems that it would be much simpler to tag the buffer-info on to the array object itself. Which, however, would require extending the array object by a single pointer [1]. Extending is in theory an ABI break if anyone subclasses ndarray from C (extending the struct) and does not very carefully anticipate the possibility. I am not even sure we support that, but its hard to be sure... Cheers, Sebastian [1] The size difference should not matter IMO, and with cythons memoryviews buffers are not an uncommon feature in any case, for the void scalar it is a bit bigger, but they are also very rare. (I thought of using weak references, but the CPython API seems not very fleshed out, or at least not documented, so not sure about that).
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Fri, May 22, 2020 at 10:14 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
I had no idea if we support that, so I crowdsourced some inputs. Feedback from Travis: "I would be quite sure there are extensions out there that do this. Please just break the ABI and change the version number to do that." Feedback from Pearu: "ndarray itself (PyArrayObject) is a kind-of subclass of PyObject. See https://www.python.org/dev/peps/pep-0253. Something like the following might work: typedef struct { PyArrayObject super; /* insert extensions here */ } MyPyArrayObject; " Cheers, Ralf
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Wed, 2020-05-27 at 18:36 +0200, Ralf Gommers wrote:
Yes, it is a break if someone subclasses from C (or probably Cython) without being very careful (and we do not help with it well right now). But, the ABI break is very mild in the sense that it is very easy to recompile such a library to be compatible with *both* old and new versions [1]. And I still think that it will be super rare (which I would love to check [2]). In either case, though, I am pretty convinced for a long time now, that a major version is becoming more and more something we should simply do. And making 1.20 a 2.0 release will have many good reasons aside from such a ABI break (and if it is just that we are expecting a lot of code churn both due to SIMD and changes in the core). To be clear, I personally do *not* like to aim for a serious ABI break. The vast majority of libraries should not require recompilation, and IMO it must be easy to create a single binary compatible with both old and new versions. If someone wants to aim for a real ABI break, I would be interested to see the thoughts on feasibility, but to me that simply feels like aiming high. And I am not sure there is much gain? But a small wave of C-API deprecation and small, technically incompatible, changes that most uses will never notice, does seem plausible to me. Cheers, Sebastian [1] You simply have to manually include the larger struct (or we update our headers). The only annoyance is that the crashes/errors that happen if you run a non-recompiled/old version against a new NumPy version may be pretty random. [2] I would also like to do an anaconda or PIP search to sieve through actual code and see that while it may technically be an ABI break, it will affect practically no largish libraries... (i.e. large enough to land in Anaconda) If anyone knows how to do that best, I would be interested.
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Fri, May 22, 2020 at 10:14 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
I had no idea if we support that, so I crowdsourced some inputs. Feedback from Travis: "I would be quite sure there are extensions out there that do this. Please just break the ABI and change the version number to do that." Feedback from Pearu: "ndarray itself (PyArrayObject) is a kind-of subclass of PyObject. See https://www.python.org/dev/peps/pep-0253. Something like the following might work: typedef struct { PyArrayObject super; /* insert extensions here */ } MyPyArrayObject; " Cheers, Ralf
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Wed, 2020-05-27 at 18:36 +0200, Ralf Gommers wrote:
Yes, it is a break if someone subclasses from C (or probably Cython) without being very careful (and we do not help with it well right now). But, the ABI break is very mild in the sense that it is very easy to recompile such a library to be compatible with *both* old and new versions [1]. And I still think that it will be super rare (which I would love to check [2]). In either case, though, I am pretty convinced for a long time now, that a major version is becoming more and more something we should simply do. And making 1.20 a 2.0 release will have many good reasons aside from such a ABI break (and if it is just that we are expecting a lot of code churn both due to SIMD and changes in the core). To be clear, I personally do *not* like to aim for a serious ABI break. The vast majority of libraries should not require recompilation, and IMO it must be easy to create a single binary compatible with both old and new versions. If someone wants to aim for a real ABI break, I would be interested to see the thoughts on feasibility, but to me that simply feels like aiming high. And I am not sure there is much gain? But a small wave of C-API deprecation and small, technically incompatible, changes that most uses will never notice, does seem plausible to me. Cheers, Sebastian [1] You simply have to manually include the larger struct (or we update our headers). The only annoyance is that the crashes/errors that happen if you run a non-recompiled/old version against a new NumPy version may be pretty random. [2] I would also like to do an anaconda or PIP search to sieve through actual code and see that while it may technically be an ABI break, it will affect practically no largish libraries... (i.e. large enough to land in Anaconda) If anyone knows how to do that best, I would be interested.
participants (2)
-
Ralf Gommers
-
Sebastian Berg