calling NumPy from Julia - a plea for fewer macros
Dear NumPy developers, I've been working on a glue package that allows the Julia language (http://julialang.org/) to call Python routines easily https://github.com/stevengj/PyCall.jl and I'm using NumPy to pass multidimensional arrays between languages. Julia has the ability to call C functions directly (without writing C glue), and I've been exploiting this to write PyCall purely in Julia. (This is nice for a number of reasons; besides programming and linking convenience, it means that I can dynamically load different Python versions on the same machine, and don't need to recompile if e.g. NumPy is updated.) However, calling NumPy has been a challenge, because of NumPy's heavy reliance on macros in its C API. I wanted to make a couple of suggestions to keep in mind as you plan for NumPy 2.0: 1) Dynamically linking to NumPy's C API was challenging, to say the least. Assuming you stick with the PyArray_API lookup table of pointers, it would be much easier to call from other languages if you include e.g. a numpy.core.multiarray._ARRAY_API_NAMES variable in the Python module that is a list of strings giving the symbol names corresponding to the numpy.core.multiarray._ARRAY_API pointer. (Plus documentation, of course.) Currently I need to parse __multiarray_api.h to extract this information, which is somewhat hackish. 2) Please provide non-macro equivalents (exported in the _ARRAY_API symbol table or otherwise) of PyArray_NDIM etcetera to access PyArrayObject members. (e.g. call them PyArray_ndim etc. Note that inline functions are not enough, since they are not loadable dynamically.) Right now, the only ways[*] I can see to access this information are either to use C glue (which I want to avoid for the reasons above) or to call Python to access the __array_interface__ attribute (which is suboptimal from a performance standpoint). Thanks for all your efforts! Any feedback on PyCall would be welcome, too. --SGJ [*] A third way would be to parse ndarraytypes.h to extract the format of the PyArrayObject_fields structure, and use upcoming Julia support for accessing C struct types to read the fields. This is likely to require tracking NumPy releases carefully to avoid breakage, however, as well as involving some care with the PyObject_HEAD macro. PS. If you want to try out PyCall with NumPy, note that a patch to Julia is currently required for this to work: https://github.com/JuliaLang/julia/pull/2317
On 17 Feb 2013 08:13, "Steven G. Johnson" wrote:
Julia has the ability to call C functions directly (without writing C glue), and I've been exploiting this to write PyCall purely in Julia. (This is nice for a number of reasons; besides programming and linking convenience, it means that I can dynamically load different Python versions on the same machine, and don't need to recompile if e.g. NumPy is updated.) However, calling NumPy has been a challenge, because of NumPy's heavy reliance on macros in its C API.
I wanted to make a couple of suggestions to keep in mind as you plan for NumPy 2.0:
There are currently no plans to produce a NumPy 2.0, but everything you suggest would be just fine as changes to numpy 1.x. PRs gratefully accepted. -n
Nathaniel Smith wrote:
There are currently no plans to produce a NumPy 2.0, but everything you suggest would be just fine as changes to numpy 1.x. PRs gratefully accepted.
Thanks, just posted https://github.com/numpy/numpy/issues/2997 https://github.com/numpy/numpy/issues/2998 --SGJ
On Sun, Feb 17, 2013 at 9:12 AM, Steven G. Johnson <stevenj@alum.mit.edu>wrote:
Dear NumPy developers,
I've been working on a glue package that allows the Julia language (http://julialang.org/) to call Python routines easily https://github.com/stevengj/PyCall.jl and I'm using NumPy to pass multidimensional arrays between languages.
Julia has the ability to call C functions directly (without writing C glue), and I've been exploiting this to write PyCall purely in Julia. (This is nice for a number of reasons; besides programming and linking convenience, it means that I can dynamically load different Python versions on the same machine, and don't need to recompile if e.g. NumPy is updated.) However, calling NumPy has been a challenge, because of NumPy's heavy reliance on macros in its C API.
I wanted to make a couple of suggestions to keep in mind as you plan for NumPy 2.0:
1) Dynamically linking to NumPy's C API was challenging, to say the least. Assuming you stick with the PyArray_API lookup table of pointers, it would be much easier to call from other languages if you include e.g. a numpy.core.multiarray._ARRAY_API_NAMES variable in the Python module that is a list of strings giving the symbol names corresponding to the numpy.core.multiarray._ARRAY_API pointer. (Plus documentation, of course.) Currently I need to parse __multiarray_api.h to extract this information, which is somewhat hackish.
It shouldn't be too much work to provide something like that. The current API is generated, take a look at numpy/core/codegenerators/numpy_api.py. PR's welcome.
2) Please provide non-macro equivalents (exported in the _ARRAY_API symbol table or otherwise) of PyArray_NDIM etcetera to access PyArrayObject members. (e.g. call them PyArray_ndim etc. Note that inline functions are not enough, since they are not loadable dynamically.) Right now, the only ways[*] I can see to access this information are either to use C glue (which I want to avoid for the reasons above) or to call Python to access the __array_interface__ attribute (which is suboptimal from a performance standpoint).
There are already functional versions of PyArray_NDIM and some others, put in as part of a long term project to hide the numpy internals so that we can modify structures and such at some point. We could use more work in that direction and would welcome any input/PR's you might offer. The current functions can be used instead of the macros by putting #define NPY_NO_DEPRECATED_API NPY_API_VERSION before any includes. The NPY_API_VERSION serves to mark which functions were introduced in which numpy version, so as to maintain backward compatibility with 3'rd party code. See the lines starting at 1377 in ndarraytypes.h. for currently available functions. There might also be some useful things in dynd/blaze which, IIRC, support numpy for some computations. They are located at https://github.com/ContinuumIO/ Chuck
On Sun, Feb 17, 2013 at 12:43 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
On Sun, Feb 17, 2013 at 9:12 AM, Steven G. Johnson <stevenj@alum.mit.edu>wrote:
Dear NumPy developers,
I've been working on a glue package that allows the Julia language (http://julialang.org/) to call Python routines easily https://github.com/stevengj/PyCall.jl and I'm using NumPy to pass multidimensional arrays between languages.
Julia has the ability to call C functions directly (without writing C glue), and I've been exploiting this to write PyCall purely in Julia. (This is nice for a number of reasons; besides programming and linking convenience, it means that I can dynamically load different Python versions on the same machine, and don't need to recompile if e.g. NumPy is updated.) However, calling NumPy has been a challenge, because of NumPy's heavy reliance on macros in its C API.
I wanted to make a couple of suggestions to keep in mind as you plan for NumPy 2.0:
1) Dynamically linking to NumPy's C API was challenging, to say the least. Assuming you stick with the PyArray_API lookup table of pointers, it would be much easier to call from other languages if you include e.g. a numpy.core.multiarray._ARRAY_API_NAMES variable in the Python module that is a list of strings giving the symbol names corresponding to the numpy.core.multiarray._ARRAY_API pointer. (Plus documentation, of course.) Currently I need to parse __multiarray_api.h to extract this information, which is somewhat hackish.
It shouldn't be too much work to provide something like that. The current API is generated, take a look at numpy/core/codegenerators/numpy_api.py. PR's welcome.
2) Please provide non-macro equivalents (exported in the _ARRAY_API symbol table or otherwise) of PyArray_NDIM etcetera to access PyArrayObject members. (e.g. call them PyArray_ndim etc. Note that inline functions are not enough, since they are not loadable dynamically.) Right now, the only ways[*] I can see to access this information are either to use C glue (which I want to avoid for the reasons above) or to call Python to access the __array_interface__ attribute (which is suboptimal from a performance standpoint).
There are already functional versions of PyArray_NDIM and some others, put in as part of a long term project to hide the numpy internals so that we can modify structures and such at some point. We could use more work in that direction and would welcome any input/PR's you might offer. The current functions can be used instead of the macros by putting
#define NPY_NO_DEPRECATED_API NPY_API_VERSION
before any includes. The NPY_API_VERSION serves to mark which functions were introduced in which numpy version, so as to maintain backward compatibility with 3'rd party code. See the lines starting at 1377 in ndarraytypes.h. for currently available functions.
There might also be some useful things in dynd/blaze which, IIRC, support numpy for some computations. They are located at https://github.com/ContinuumIO/
Oops, sorry, I didn't see your comments about inline functions. I don't see why something like this couldn't be supported, perhaps as a library like we have for math functions, umath.so. Chuck
participants (3)
-
Charles R Harris
-
Nathaniel Smith
-
Steven G. Johnson