Exporting Numpy C functionality to other extension modules
Actually, wrt my previous message on cobjects for communicating between extension modules, we can do one better! This is an idea I've been toying with for the MacPython extension types, and I think it's applicable to Numeric too. It goes as follows. Each Numeric object has an attribute with a well-known name, lets call it "__Numeric_C_interface". This is a Cobject, and it is shared among all Numeric objects of the same type. The value of this C object is a pointer to a C structure with pointers to all the C routines you might want to call on the object, basically the PyArray_API structure (I think). The descr of the C object is a string with the version number of this particular PyArray_API structure. An extension module that knows about this protocol and gets passed an object that it think might be a Numeric array checks whether the object has an __Numeric_C_interface attribute. If so it retrieves it, checks that it is a Cobject, gets the descriptor and tests it for compatibility and if it is compatible gets the cobject pointer and happily calls all the Numeric routines it needs. -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman -
A Dimecres 15 Gener 2003 23:33, Jack Jansen va escriure:
Actually, wrt my previous message on cobjects for communicating between extension modules, we can do one better!
This is an idea I've been toying with for the MacPython extension types, and I think it's applicable to Numeric too. It goes as follows.
Each Numeric object has an attribute with a well-known name, lets call it "__Numeric_C_interface". This is a Cobject, and it is shared among all Numeric objects of the same type. The value of this C object is a pointer to a C structure with pointers to all the C routines you might want to call on the object, basically the PyArray_API structure (I think). The descr of the C object is a string with the version number of this particular PyArray_API structure.
An extension module that knows about this protocol and gets passed an object that it think might be a Numeric array checks whether the object has an __Numeric_C_interface attribute. If so it retrieves it, checks that it is a Cobject, gets the descriptor and tests it for compatibility and if it is compatible gets the cobject pointer and happily calls all the Numeric routines it needs.
That's a nice idea. But I see two drawbacks: - numarray needs to be reworked to include the Cobject descriptors, although I don't know if this would be difficult or not. - you still need to have Numeric or numarray installed on the client machine. This could be the usual case, but what about extensions that want to use Numeric internally (because a number of reasons, like better number representation, convenient interface to C, etc) without forcing the user to install it? However, designing a small library with a minimalist API (I'm thinking in something similar to zlib) could be very handy in allowing extensions (but also native python modules) to deal with numarray objects. As I said before, this would require the user to install only this small library, but it can also be included in the application or package. However, this second alternative can be tricky, as Chris Barker has signaled, because the different numarray versions coming in the future. But IMO a series of factors may alleviate this handicap: - The numarray data structure should be very stable, as improvements are normally made at the functionality level. - The library should provide a minimalistic, high level API that, if it is well designed, should cope with small modifications in the numarray data structures. - Finally, when these differences has to be added, and that would break the current API, this version should be marked as a major release, and existing extensions (or whatever software that is embedding the library) will know that they have to release new versions if they want to support the newest objects. But, hopefully, that should happen quite unfrequently. Of course, this small library should cope with both numarray and Numeric (at least, the not too old versions of it) objects. But I think this shouldn't pose a big problem as the actual numarray API already can do that. This logical separation between structure and functionality migth also lead to a better acceptation by numerical software cratftsmen, as they can be more confident in that the API to deal with numarray objects will be quite stable throughout the time. Well, this is just a thought. I must confess that I'm so interested on that issue because I really want to support numarray objects in my project, and I'm just wondering which is the best way to do that without creating too much nuissance to the users. In fact, I'm pondering to build up such a library myself, but that can be a waste of time if I've to redone it in every numarray release. Cheers, -- Francesc Alted
Jack Jansen wrote:
An extension module that knows about this protocol and gets passed an object that it think might be a Numeric array checks whether the object has an __Numeric_C_interface attribute. If so it retrieves it, checks that it is a Cobject, gets the descriptor and tests it for compatibility and if it is compatible gets the cobject pointer and happily calls all the Numeric routines it needs.
Wow Jack! are single handely going to impliment all my pet projects that I'm too stupid to know how to do my self ? (the other one was Universal text file support) I can only barely follow what you're suggesting, but I still have a question about it. It seems while this would provide a way ro an extension module to identify whether an object was a Numeric array, and then get a pointer to it, how would it know the API for dealing with the arrays, without the Numeric header file? Or would you have to include the header file when compiling, but not need the library at runtime unless it was actually used, which seems a reasonable compromise. If this would work, I think it's a great idea. Short of including NumArray with the standard library (which I imagine is a least a couple of Python releases away), it would be a great solution for folks that are writing extensions that they want to be able take advantage of Numeric when it's there, but not require it. Do any of the primary Numarray developers think this is a good and doable idea? -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Chris Barker wrote:
Jack Jansen wrote:
An extension module that knows about this protocol and gets passed an object that it think might be a Numeric array checks whether the object has an __Numeric_C_interface attribute. If so it retrieves it, checks that it is a Cobject, gets the descriptor and tests it for compatibility and if it is compatible gets the cobject pointer and happily calls all the Numeric routines it needs.
Wow Jack! are single handely going to impliment all my pet projects that I'm too stupid to know how to do my self ? (the other one was Universal text file support)
I can only barely follow what you're suggesting, but I still have a question about it. It seems while this would provide a way ro an extension module to identify whether an object was a Numeric array, and then get a pointer to it, how would it know the API for dealing with the arrays, without the Numeric header file? Or would you have to include the header file when compiling, but not need the library at runtime unless it was actually used, which seems a reasonable compromise.
If this would work, I think it's a great idea. Short of including NumArray with the standard library (which I imagine is a least a couple of Python releases away), it would be a great solution for folks that are writing extensions that they want to be able take advantage of Numeric when it's there, but not require it.
Do any of the primary Numarray developers think this is a good and doable idea?
Roll out the time machine... it's already done. As long as you don't define the macros PY_ARRAY_UNIQUE_SYMBOL or NO_IMPORT_ARRAY, any file that includes arrayobject.h gets a static copy of PyArray_API. If the module executes import_array() at an appropriate time, normally module initialization, but not necessarily, the static PyArray_API gets filled in and becomes usable. The import_array() call is critical; without it, API calls through the static PyArray_API are calls to NULL and segfault. I think that if Numeric is not present, and you call import_array(), it will fail quietly but leave the Python error status set. So it might make sense to call PyErr_Clear() after doing import_array().
-Chris
So it sounds like your whole "weak linkage" scheme is plausible now with Numeric (maybe even numarray!), as would be a minimal API module. 1. We discussed yesterday how to determine if an object is a Numeric array w/o even compiling with arrayobject.h. The important idea there was that if Numeric is not present, the "isarray" (or whatever) function will return false rather than segfaulting because the API pointer isn't filled in. 2. Call API functions in contexts where you know you're looking at Numeric arrays, i.e., right after isarray(). This creates a guard which prevents you from calling API functions when Numeric is not present. 3. Call import_array() at some time before using the API functions, possibly at module init time, failing quietly and clearing the error in installations where Numeric is not installed. Todd
Take a look at the attached extension module "testlite" which demonstrates the technique I evolved from this discussion. As we discussed, this usage pattern enables the construction of an extension which will take advantage of numarray if it is there, but will continue to work if the user has not installed numarray. Here's how it works: 1. I created a new API function, PyArray_isArray() which is safe to call in all contexts. I defined it as: #define PyArray_isArray(o) (PyArray_API && NA_isNumArray(o)) I added NA_isNumArray(o) to the numarray C-API because it was the easy way to do it. 2. Ordinary API functions are safe to call once an object has been identified to be a numarray because it implies (locally) that the PyArray_API pointer has been initialized. 3. I tried out the standard import_array() code and added some cleanup for the case where numarray is not installed. The only caveat I see at this point is that you are required to include numarray headers in order to use this. In numarray's case, this might necessitate header updates and/or function call modifications. The numarray C-API should stabilize pretty soon, but I don't think its quite there yet. The same approach should apply to Numeric. This stuff is in numarray CVS now and should be in the next numarray release. Todd /* This module demonstrates how to make a weak linkage between an extension module (which might consider numarray to be optional) and numarray. There are essentially two parts to the "weak linkage": 1. PyArray_isArray(o) identifies if an arbitrary Python object is a NumArray. Once it is known that an object is a numarray, assume that PyArray_API has been initialized and call API functions. PyArray_isArray(o) is unique in the sense that it will work correctly even if numarray is not installed. In that case, it always reports false. Because of this, PyArray_isArray() is useful as a guard around arbitrary API functions which should never be called when numarray is not present. Doing so results in a call to NULL and ensuing segfault. 2. At module init time, prior to calling and numarray API functions, call import_array(). Since, in the case where numarray has *not* been installed, the import_array() call will fail, check for and clear the Python error state. This will prevent the Python exception handling mechanisms from inadvertantly trapping the error later on. */ #include "Python.h" #include <stdio.h> #include <math.h> #include <signal.h> #include <ctype.h> #include "arrayobject.h" static PyObject *_Error; static PyObject * Py_testit(PyObject *obj, PyObject *args) { PyObject *it; if (!PyArg_ParseTuple(args, "O", &it)) return PyErr_Format(_Error, "testit: Invalid parameters."); if (PyArray_isArray(it)) { fprintf(stderr, "It's an array.\n"); /* It's safe to call API functions in here */ } else { fprintf(stderr, "It't not an array.\n"); /* But never call numarray API functions out here. */ } Py_INCREF(Py_None); return Py_None; } static PyMethodDef _Methods[] = { {"testit", Py_testit, METH_VARARGS, "testit(obj) prints a message identifying an object."}, {NULL, NULL} /* Sentinel */ }; /* platform independent*/ #ifdef MS_WIN32 __declspec(dllexport) #endif void inittestlite(void) { PyObject *m, *d; m = Py_InitModule("testlite", _Methods); d = PyModule_GetDict(m); _Error = PyErr_NewException("testlite.error", NULL, NULL); PyDict_SetItemString(d, "error", _Error); import_array(); if (PyErr_Occurred()) PyErr_Clear(); } /* * Local Variables: * mode: C * c-file-style: "python" * End: */
Konrad Hinsen wrote:
M = array(l) Mt = M.transpose()
just isn't that much worse than:
Mt = transpose(l)
No, but the automatic conversion enables me to write functions that accept any sequence type without even having to think about it.
I've used that to, but I also frequently use something like this: def function(A): A = array(A) ... Which is pretty simple to.
Moreover, it is almost essential in many situations to accept scalars in place of arrays, because scalars fulfill the role of rank-0 arrays.
Yes, this is critical. Isn't there a plan to make the scalar -- rank-0 array dicotomy a little cleaner in NumArray ?
I also agree that the point is not subclassing per se, it's polymorphism. It should be easy to write a class that acts like an array in all the ways that you need it to.
True, and that is a weak point of NumPy.
Is this getting any better with NumArray? -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Yes, this is critical. Isn't there a plan to make the scalar -- rank-0 array dicotomy a little cleaner in NumArray ?
Hmmm, I'd like to say yes, but I'm not sure what exactly you are referring to. Please elaborate on how you think it should be changed. About the only thing that comes to mind is that repr() for rank-0 will be different for numarray than Numeric, and that it will never be the result of any reduction or similar selection.
I also agree that the point is not subclassing per se, it's polymorphism. It should be easy to write a class that acts like an array in all the ways that you need it to.
True, and that is a weak point of NumPy.
Is this getting any better with NumArray?
Again, I hope so, but I find this too general to know if it satisfies anyone's specific goals. I'd like to see specific examples. I think it is often tricker than people initially think. Perry
participants (5)
-
Chris Barker
-
Francesc Alted
-
Jack Jansen
-
Perry Greenfield
-
Todd Miller