I thought I just give my point of view on this, since I do believe we should give these some thought.
On Oct 23, 2004, at 12:18 AM, Russell E Owen wrote:
OK, since I seem to be in a grumpy mood today, here are some examples (probably nothing new here):
- I'll expose my ignorance, but I find the take stuff and fancy
indexing nearly incomprehensible. I've tried to follow the examples (several times--i.e. every time I need to do something fancy), but generally I either flail around until I find something that works, or give up and write a C extension.
I agree, it is very complicated, I always have trouble getting understanding what is going on when I use take and indexing. More documentation may help.
- I'd like to write C/C++ code that would work on multiple array
types. This seems a natural use of C++ templates, but that doesn't seem to be "how it's done". I hate to think how the internal code is managing this without being a horrible sphaghetti of code repeated for each array type.
This is a good point. If you look at examples for implementing something in C, you always see that the code only handles a single data type, usually converting all input to double type. That is not always a good way to write an extension if you want it to be of generic use (e.g. the FFT module does not handle 32 bits floating point well, which is a problem for big arrays). Some support in writing functions that handle multiple data types would be good.
The nd_image package is the closest I've come to finding source code that makes any sense to me in this areay. But it uses so many custom-defined specialized functions that I figured it was just too much work to figure out w/out a manual (and risky to rely on these functions since they are internal to the package).
The internal nd_image C functions are indeed not exported and should not be used to implement extensions. That is going to stay that way since I do not plan to document these, and in any case, exposing such functions is not the purpose of the module.
On the other hand, some of the techniques use may be generally useful. I could try to factor some of the functions and macros out and write something up on the use of these to write extensions that handle multiple data types.
So I gave up and just support the one data type I really need now. Very disappointing.
Yes, it should be easier to do this, I agree. Using C macros as a 'poor man' templating system is in fact not too complicated (although pretty ugly).
Another approach that I have tried to use in nd_image is to provide generic functions that take a python or a C function to implement functionality. For instance to implement an arbitrary filter function in nd_image you only need to implement a function that calculates the filter at one point. You then call a generic filter function that does the heavy lifting of dealing with multiple array types, iterating over the array, dealing with borders and such, applying the function at each array element. The filter function can be in python, but can also be a C function, communicated by a CObject.
Maybe some of these type functions could be provided with the numarray package. This could simplify writing extensions a lot. Would there be interest for a package of such functions? If there is I could think about it a bit more, and propose (and implement) something in the form of an extension.
- Important functions are sometimes buried in a non-obvious (to me)
For example: try to find that location at which an array has a minimum value (if there's more than one such point, pick any). You'd think it'd be a standard numarray function, wouldn't you? After all, you can ask for the minimum value. Now try to find it.
Agreed, this bothered me too.
Well, I started out by trying to figure out how to get argmin to do the job. Horrible.
Fortunately I finally found minimum_position buried in nd_image.
It is there because numarray did not provide it... But it is also there because it offers much functionality that would not be appropriate for the main package. It is part of the object measurement functions. A simpler, possibly more efficient routine should maybe be part of the main package.
- Masked arrays are not integrated. Thus a lot of important filtering
and stuff simply cannot be done on masked data without writing custom extensions. For instance I'd like to do a median-filter that ignores masked data (taking the median of non-masked data only).
I agree very much! To be honest, I do not like the ma package much. I don't like the idea of having to use a separate package with a different array type that duplicates the functionality in the main package. I think it would be much better if all functions (where it makes sense) in numarray would accept an optional mask argument. To me it makes more sense to provide the mask with the operation, not as part of the array like in ma (a package like ma could still be layered on top.) I realize it would be a lot of work to make all numarray functions mask aware, but it is something to think about maybe.
- For 2-d images x and y are reversed. I know this isn't going to
change, but it is a headache every time I have to write new image processing code.
This is not really a problem I think, but you have to get used to it. If you treat the last dimension always as X and the first as Y, you have the same layout in memory as is usual in most image processing software. So X corresponds to axis=1 and Y to axis=0. Or use axis=-1 and axis=-2.