[Numpy-discussion] Counting array elements

Peter Verveer verveer at embl-heidelberg.de
Sat Oct 23 04:14:04 EDT 2004


I thought I just give my point of view on this, since I do believe we 
should give these some thought.

On Oct 23, 2004, at 12:18 AM, Russell E Owen wrote:

> OK, since I seem to be in a grumpy mood today, here are some examples 
> (probably nothing new here):
> - I'll expose my ignorance, but I find the take stuff and fancy 
> indexing nearly incomprehensible. I've tried to follow the examples 
> (several times--i.e. every time I need to do something fancy), but 
> generally I either flail around until I find something that works, or 
> give up and write a C extension.

I agree, it is  very complicated, I always have trouble getting 
understanding what is going on when I use take and indexing. More 
documentation may help.

> - I'd like to write C/C++ code that would work on multiple array 
> types. This seems a natural use of C++ templates, but that doesn't 
> seem to be "how it's done". I hate to think how the internal code is 
> managing this without being a horrible sphaghetti of code repeated for 
> each array type.

This is a good point. If you look at examples for implementing 
something in C, you always see that the code only handles a single data 
type, usually converting all input to double type. That is not always a 
good way to write an extension if you want it to be of generic use 
(e.g. the FFT module does not handle 32 bits floating point well, which 
is a problem for big arrays). Some support in writing functions that 
handle multiple data types would be good.

> The nd_image package is the closest I've come to finding source code 
> that makes any sense to me in this areay. But it uses so many 
> custom-defined specialized functions that I figured it was just too 
> much work to figure out w/out a manual (and risky to rely on these 
> functions since they are internal to the package).

The internal nd_image C functions are indeed not exported and should 
not be used to implement extensions. That is going to stay that way 
since I do not plan to document these, and in any case, exposing such 
functions is not the purpose of the module.

On the other hand, some of the techniques use may be generally useful. 
I could try to factor some of the functions and macros out and write 
something up on the use of these to write extensions that handle 
multiple data types.

> So I gave up and just support the one data type I really need now. 
> Very disappointing.

Yes, it should be easier to do this, I agree. Using C macros as a 'poor 
man' templating system is in fact not too complicated (although pretty 
ugly).

Another approach that I have tried to use in nd_image is to provide 
generic functions that take a python or a C function to implement 
functionality. For instance to implement an arbitrary filter function 
in nd_image  you only need to implement a function that calculates the 
filter at one point. You then call a generic filter function that does 
the heavy lifting of dealing with multiple array types,  iterating over 
the array, dealing with borders and such, applying the function at each 
array element. The filter function can be in python, but can also be a 
C function, communicated by a CObject.

Maybe some of these type functions could be provided with the numarray 
package. This could simplify writing extensions a lot. Would there be 
interest for a package of such functions? If there is I could think 
about it a bit more, and propose (and implement) something in the form 
of an extension.

> - Important functions are sometimes buried in a non-obvious (to me) 
> sub-package.
>
> For example: try to find that location at which an array has a minimum 
> value (if there's more than one such point, pick any). You'd think 
> it'd be a standard numarray function, wouldn't you? After all, you can 
> ask for the minimum value. Now try to find it.

Agreed, this bothered me too.

> Well, I started out by trying to figure out how to get argmin to do 
> the job. Horrible.
>
> Fortunately I finally found minimum_position buried in nd_image.

It is there because numarray did not provide it... But it is also there 
because it offers much functionality that would not be appropriate for 
the main package. It is part of the object measurement functions. A 
simpler, possibly more efficient routine should maybe be part of the 
main package.

> - Masked arrays are not integrated. Thus a lot of important filtering 
> and stuff simply cannot be done on masked data without writing custom 
> extensions. For instance I'd like to do a median-filter that ignores 
> masked data (taking the median of non-masked data only).

I agree very much! To be honest, I do not like the ma package much. I 
don't like the idea of having to use a separate package with a 
different array type that duplicates the functionality in the main 
package. I think it would be much better if all functions (where it 
makes sense) in numarray would accept an optional mask argument. To me 
it makes more sense to provide the mask with the operation, not as part 
of the array like in ma (a package like ma could still be layered on 
top.) I realize it would be a lot of work to make all numarray 
functions mask aware, but it is something to think about maybe.

> - For 2-d images x and y are reversed. I know this isn't going to 
> change, but it is a headache every time I have to write new image 
> processing code.

This is not really a problem I think, but you have to get used to it. 
If you treat the last dimension always as X and the first as Y, you 
have the same layout in memory as is usual in most image processing 
software. So X corresponds to axis=1 and Y to axis=0. Or use axis=-1 
and axis=-2.

Cheers, Peter





More information about the NumPy-Discussion mailing list