[Numpy-discussion] general newbie question about applying an arbitrary function to all elements in a numpy array

Michael Katz michaeladamkatz at yahoo.com
Tue May 3 19:58:25 EDT 2011


I have a basic question about applying functions to array elements. There is a 
rambling introduction here and then I get to the ACTUAL QUESTION about 2/3 of 
the way down.

RAMBLING INTRODUCTION

I was trying to do a simple mapping of array values to other values. So I had:

                    unmapped_colors = np.array((0,0,1,1,2,0,0,1)) 
                    BNA_LAND_FEATURE_CODE = 0 
                      BNA_WATER_FEATURE_CODE = 1 
                    # color_to_int() produces some unsigned int value
                    green = color_to_int( 0.25, 0.5, 0, 0.75 )
                    blue = color_to_int( 0.0, 0.0, 0.5, 0.75 ) 
                    gray = color_to_int( 0.5, 0.5, 0.5, 0.75 ) 
                    def map_type_to_color( t ): 
                        if ( t == BNA_LAND_FEATURE_CODE ):
                            return green 
                        elif ( t == BNA_WATER_FEATURE_CODE ):
                            return blue 
                        else: 
                            return gray 
                    mapped_colors = np.vectorize(  map_type_to_color )( 
unmapped_colors )

I found that by using vectorize I got about a 3x speedup compared to using a 
python for loop to loop through the array. One small question is I wonder where 
that speedup comes from; i.e., what vectorize is actually doing for me. Is it 
just the time for the actual looping logic? That seems like too much speedup. 
Does it also include a faster access of the array data items?

So I was trying to find a "pure numpy" solution for this. I then learned about 
fancy indexing and boolean indexing, and saw that I could do boolean array 
version:

    mapped_colors = np.zeros(unmapped_colors.shape, dtype=np.uint32) + gray 
    mapped_colors[unmapped_colors==BNA_LAND_FEATURE_CODE] = green 
    mapped_colors[unmapped_colors==BNA_WATER_FEATURE_CODE] = blue 

and in fact I could do "fancy indexing" version:

    color_array = np.array((green, blue, gray), dtype=np.uint32) 
    colors = color_array[unmapped_colors] 

The boolean array version is pretty simple, but makes three "passes" over the 
data.

The fancy indexing version is simple and one pass, but it relies on the fact 
that my constant values happen to be nice to use as array indexes. And in fact 
to use it in actual code I'd need to do one or more other passes to check 
unmapped_colors for any indexes < 0 or > 2.

ACTUAL QUESTION

So, the actual question here is, even though I was able to find a couple 
solutions for this particular problem, I am wondering in general about applying 
arbitrary functions to each array element in a "pure numpy" way, ideally using 
just a single pass.

It seems to me like there should be a some way to build up a  "numpy compiled" 
function that you could then apply to all elements in  an array. Is that what 
"universal functions" are about?  That's what it sounded like it was about to me 
at first, but then after looking closer it  didn't seem like that.

I guess I was hoping vectorize was some magical  thing that could compile 
certain python expressions down into "numpy  primitives". It seems like as long 
as you stuck to a limited set of math  and logical operations on arrays, even if 
those expressions included  calls to subroutines that were similarly restricted, 
there should be an  automated way to convert such expressions into the 
equivalent of a numpy  built-in. I guess it's similar to cython, but it wouldn't 
require a  separate file or special compilation. I could just say

    def capped_sqrt( n ):
        if ( x > 100 )
            return 10
        else:
            return sqrt( n )
    f = numpy.compile( lambda(x): 0 if ( x < 10 ) else capped_sqrt( x ) )
    numpy.map( f, a )

or something like that, and it would all happen in a single pass within numpy, 
with no "python code" being run.

Is that something that exists?

I realize that for any function like this I could build it out of a sequence of 
individual numpy operations on the whole array, but it seems that in general 
that requires doing several passes, creating multiple temp arrays, and computing 
functions for elements where the results will later be thrown out. This might be 
just because I'm a numpy newbie, but it's not clear to me how to do the example 
above on an array of size N without computing N square roots, even if the number 
of elements in my array between 10 and 100 is much smaller than N . Granted, I 
could set things up so that by the time sqrt was applied it would only be 
computing, say, sqrt( 0 ) for all those values that were originally < 10 or > 
100. But still, it would be nice not to compute unnecessarily at all. And 
efficiency concerns aside, it just seems unfortunate, language-wise, to always 
have to think cleverly about some sequence of whole-array transformations 
instead of just being able to think about the "single pass computation" that is 
in your head (at least is in my head) in the first place and is after all the 
most direct and non-redundant way to get the result.

Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110503/3f3b1310/attachment.html>


More information about the NumPy-Discussion mailing list