[Numpy-discussion] general newbie question about applying an arbitrary function to all elements in a numpy array

Tue May 3 20:03:32 EDT 2011

On Tue, May 3, 2011 at 6:58 PM, Michael Katz <michaeladamkatz at yahoo.com>wrote:

> I have a basic question about applying functions to array elements. There
> is a rambling introduction here and then I get to the ACTUAL QUESTION about
> 2/3 of the way down.
>
> RAMBLING INTRODUCTION
>
> I was trying to do a simple mapping of array values to other values. So I
> had:
>
>                     unmapped_colors = np.array((0,0,1,1,2,0,0,1))
>                     BNA_LAND_FEATURE_CODE = 0
>                     BNA_WATER_FEATURE_CODE = 1
>                     # color_to_int() produces some unsigned int value
>                     green = color_to_int( 0.25, 0.5, 0, 0.75 )
>                     blue = color_to_int( 0.0, 0.0, 0.5, 0.75 )
>                     gray = color_to_int( 0.5, 0.5, 0.5, 0.75 )
>                     def map_type_to_color( t ):
>                         if ( t == BNA_LAND_FEATURE_CODE ):
>                             return green
>                         elif ( t == BNA_WATER_FEATURE_CODE ):
>                             return blue
>                         else:
>                             return gray
>                     mapped_colors = np.vectorize( map_type_to_color )(
> unmapped_colors )
>
> I found that by using vectorize I got about a 3x speedup compared to using
> a python for loop to loop through the array. One small question is I wonder
> where that speedup comes from; i.e., what vectorize is actually doing for
> me. Is it just the time for the actual looping logic? That seems like too
> much speedup. Does it also include a faster access of the array data items?
>
> So I was trying to find a "pure numpy" solution for this. I then learned
> about fancy indexing and boolean indexing, and saw that I could do boolean
> array version:
>
>     mapped_colors = np.zeros(unmapped_colors.shape, dtype=np.uint32) + gray
>
>     mapped_colors[unmapped_colors==BNA_LAND_FEATURE_CODE] = green
>     mapped_colors[unmapped_colors==BNA_WATER_FEATURE_CODE] = blue
>
> and in fact I could do "fancy indexing" version:
>
>     color_array = np.array((green, blue, gray), dtype=np.uint32)
>     colors = color_array[unmapped_colors]
>
> The boolean array version is pretty simple, but makes three "passes" over
> the data.
>
> The fancy indexing version is simple and one pass, but it relies on the
> fact that my constant values happen to be nice to use as array indexes. And
> in fact to use it in actual code I'd need to do one or more other passes to
> check unmapped_colors for any indexes < 0 or > 2.
>
> ACTUAL QUESTION
>
> So, the actual question here is, even though I was able to find a couple
> solutions for this particular problem, I am wondering in general about
> applying arbitrary functions to each array element in a "pure numpy" way,
> ideally using just a single pass.
>
> It seems to me like there should be a some way to build up a "numpy
> compiled" function that you could then apply to all elements in an array. Is
> that what "universal functions" are about? That's what it sounded like it
> was about to me at first, but then after looking closer it didn't seem like
> that.
>
> I guess I was hoping vectorize was some magical thing that could compile
> certain python expressions down into "numpy primitives". It seems like as
> long as you stuck to a limited set of math and logical operations on arrays,
> even if those expressions included calls to subroutines that were similarly
> restricted, there should be an automated way to convert such expressions
> into the equivalent of a numpy built-in. I guess it's similar to cython, but
> it wouldn't require a separate file or special compilation. I could just say
>
>     def capped_sqrt( n ):
>         if ( x > 100 )
>             return 10
>         else:
>             return sqrt( n )
>     f = numpy.compile( lambda(x): 0 if ( x < 10 ) else capped_sqrt( x ) )
>     numpy.map( f, a )
>
> or something like that, and it would all happen in a single pass within
> numpy, with no "python code" being run.
>
> Is that something that exists?
>
> I realize that for any function like this I could build it out of a
> sequence of individual numpy operations on the whole array, but it seems
> that in general that requires doing several passes, creating multiple temp
> arrays, and computing functions for elements where the results will later be
> thrown out. This might be just because I'm a numpy newbie, but it's not
> clear to me how to do the example above on an array of size N without
> computing N square roots, even if the number of elements in my array between
> 10 and 100 is much smaller than N . Granted, I could set things up so that
> by the time sqrt was applied it would only be computing, say, sqrt( 0 ) for
> all those values that were originally < 10 or > 100. But still, it would be
> nice not to compute unnecessarily at all. And efficiency concerns aside, it
> just seems unfortunate, language-wise, to always have to think cleverly
> about some sequence of whole-array transformations instead of just being
> able to think about the "single pass computation" that is in your head (at
> least is in my head) in the first place and is after all the most direct
> and non-redundant way to get the result.
>
> Michael
>
>
I can't comment much on the questions you raise, but I can certainly suggest
that np.where() function might be a much better choice than some of the
if-statements.

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110503/f790ebf1/attachment.html>