Mailman 3 "Extended" Outer Product - NumPy-Discussion

"Extended" Outer Product

Geoffrey Zhu

Aug. 20, 2007

9:36 p.m.

Hi Everyone, I am wondering if there is an "extended" outer product. Take the example in "Guide to Numpy." Instead of doing an multiplication, I want to call a custom function for each pair.

...

...
...
print outer([1,2,3],[10,100,1000])

[[ 10 100 1000] [ 20 200 2000] [ 30 300 3000]] So I want: [ [f(1,10), f(1,100), f(1,1000)], [f(2,10), f(2, 100), f(2, 1000)], [f(3,10), f(3, 100), f(3,1000)] ] Does anyone know how to do this without using a double loop? Thanks, Geoffrey

Show replies by date

Robert Kern

August 2007

10:37 p.m.

Geoffrey Zhu wrote:

...

Hi Everyone,

I am wondering if there is an "extended" outer product. Take the example in "Guide to Numpy." Instead of doing an multiplication, I want to call a custom function for each pair.

...
...
...
print outer([1,2,3],[10,100,1000])

[[ 10 100 1000] [ 20 200 2000] [ 30 300 3000]]

So I want:

[ [f(1,10), f(1,100), f(1,1000)], [f(2,10), f(2, 100), f(2, 1000)], [f(3,10), f(3, 100), f(3,1000)] ]

Does anyone know how to do this without using a double loop?

If you can code your function such that it only uses operations that broadcast (i.e. operators and ufuncs) and avoids things like branching or loops, then you can just use numpy.newaxis on the first array. from numpy import array, newaxis x = array([1, 2, 3]) y = array([10, 100, 1000]) f(x[:,newaxis], y) Otherwise, you can use numpy.vectorize() to turn your function into one that will do that broadcasting for you. from numpy import array, newaxis, vectorize x = array([1, 2, 3]) y = array([10, 100, 1000]) f = vectorize(f) f(x[:,newaxis], y) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Christopher Barker

11:51 p.m.

Robert Kern wrote:

...

If you can code your function such that it only uses operations that broadcast (i.e. operators and ufuncs) and avoids things like branching or loops, then you can just use numpy.newaxis on the first array.

from numpy import array, newaxis x = array([1, 2, 3]) y = array([10, 100, 1000]) f(x[:,newaxis], y)

in fact, it may make sense to just have your x be column vector anyway:

...

...
...
x array([1, 2, 3]) y array([10, 11, 12]) x.shape = (-1,1) x array([[1], [2], [3]]) x * y array([[10, 11, 12], [20, 22, 24], [30, 33, 36]])

Broadcasting is VERY cool! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Charles R Harris

5:14 a.m.

On 8/20/07, Geoffrey Zhu <zyzhu2000@gmail.com> wrote:

...

Hi Everyone,

I am wondering if there is an "extended" outer product. Take the example in "Guide to Numpy." Instead of doing an multiplication, I want to call a custom function for each pair.

...
...
...
print outer([1,2,3],[10,100,1000])

[[ 10 100 1000] [ 20 200 2000] [ 30 300 3000]]

So I want:

[ [f(1,10), f(1,100), f(1,1000)], [f(2,10), f(2, 100), f(2, 1000)], [f(3,10), f(3, 100), f(3,1000)] ]

You could make two matrices like so: In [46]: a = arange(3) In [47]: b = a.reshape(1,3).repeat(3,0) In [48]: c = a.reshape(3,1).repeat(3,1) In [49]: b Out[49]: array([[0, 1, 2], [0, 1, 2], [0, 1, 2]]) In [50]: c Out[50]: array([[0, 0, 0], [1, 1, 1], [2, 2, 2]]) which will give you all pairs. You can then make a function of these in various ways, for example In [52]: c**b Out[52]: array([[1, 0, 0], [1, 1, 1], [1, 2, 4]]) That is a bit clumsy, though. I don't know how to do what you want in a direct way. Chuck

Charles R Harris

4:44 p.m.

On 8/20/07, Geoffrey Zhu <zyzhu2000@gmail.com> wrote:

...

Hi Everyone,

I am wondering if there is an "extended" outer product. Take the example in "Guide to Numpy." Instead of doing an multiplication, I want to call a custom function for each pair.

...
...
...
print outer([1,2,3],[10,100,1000])

[[ 10 100 1000] [ 20 200 2000] [ 30 300 3000]]

So I want:

[ [f(1,10), f(1,100), f(1,1000)], [f(2,10), f(2, 100), f(2, 1000)], [f(3,10), f(3, 100), f(3,1000)] ]

Maybe something like In [15]: f = lambda x,y : x*sin(y) In [16]: a = array([[f(i,j) for i in range(3)] for j in range(3)]) In [17]: a Out[17]: array([[ 0. , 0. , 0. ], [ 0. , 0.84147098, 1.68294197], [ 0. , 0.90929743, 1.81859485]]) I don't know if nested list comprehensions are faster than two nested loops, but at least they avoid array indexing. Chuck

Timothy Hochberg

5:59 p.m.

On 8/21/07, Charles R Harris <charlesr.harris@gmail.com> wrote:

...

On 8/20/07, Geoffrey Zhu <zyzhu2000@gmail.com> wrote:

...
Hi Everyone,

I am wondering if there is an "extended" outer product. Take the example in "Guide to Numpy." Instead of doing an multiplication, I want to call a custom function for each pair.

...
...
...
print outer([1,2,3],[10,100,1000])

[[ 10 100 1000] [ 20 200 2000] [ 30 300 3000]]

So I want:

[ [f(1,10), f(1,100), f(1,1000)], [f(2,10), f(2, 100), f(2, 1000)], [f(3,10), f(3, 100), f(3,1000)] ]

Maybe something like

In [15]: f = lambda x,y : x*sin(y)

In [16]: a = array([[f(i,j) for i in range(3)] for j in range(3)])

In [17]: a Out[17]: array([[ 0. , 0. , 0. ], [ 0. , 0.84147098, 1.68294197], [ 0. , 0.90929743, 1.81859485]])

I don't know if nested list comprehensions are faster than two nested loops, but at least they avoid array indexing.

This is just a general comment on recent threads of this type and not directed specifically at Chuck or anyone else. IMO, the emphasis on avoiding FOR loops at all costs is misplaced. It is often more memory friendly and thus faster to vectorize only the inner loop and leave outer loops alone. Everything varies with the specific case of course, but trying to avoid FOR loops on principle is not a good strategy. -- . __ . |-\ . . tim.hochberg@ieee.org

Geoffrey Zhu

7:46 p.m.

On 8/21/07, Timothy Hochberg <tim.hochberg@ieee.org> wrote:

...

On 8/21/07, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On 8/20/07, Geoffrey Zhu < zyzhu2000@gmail.com> wrote:

...
Hi Everyone,

I am wondering if there is an "extended" outer product. Take the example in "Guide to Numpy." Instead of doing an multiplication, I want to call a custom function for each pair.

...
...
...
print outer([1,2,3],[10,100,1000])

[[ 10 100 1000] [ 20 200 2000] [ 30 300 3000]]

So I want:

[ [f(1,10), f(1,100), f(1,1000)], [f(2,10), f(2, 100), f(2, 1000)], [f(3,10), f(3, 100), f(3,1000)] ]

Maybe something like

In [15]: f = lambda x,y : x*sin(y)

In [16]: a = array([[f(i,j) for i in range(3)] for j in range(3)])

In [17]: a Out[17]: array([[ 0. , 0. , 0. ], [ 0. , 0.84147098, 1.68294197], [ 0. , 0.90929743, 1.81859485]])

I don't know if nested list comprehensions are faster than two nested

loops, but at least they avoid array indexing.

This is just a general comment on recent threads of this type and not directed specifically at Chuck or anyone else.

IMO, the emphasis on avoiding FOR loops at all costs is misplaced. It is often more memory friendly and thus faster to vectorize only the inner loop and leave outer loops alone. Everything varies with the specific case of course, but trying to avoid FOR loops on principle is not a good strategy.

I agree. My original post asked for solutions without using two nested for loops because I already know the two for loop solution. Besides, I was hoping that some version of 'outer' will take in a function reference and call the function instead of doing multiplifcation.

Timothy Hochberg

7:56 p.m.

On 8/21/07, Geoffrey Zhu <zyzhu2000@gmail.com> wrote:

...

On 8/21/07, Timothy Hochberg <tim.hochberg@ieee.org> wrote:

...
On 8/21/07, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On 8/20/07, Geoffrey Zhu < zyzhu2000@gmail.com> wrote:

...
Hi Everyone,

I am wondering if there is an "extended" outer product. Take the example in "Guide to Numpy." Instead of doing an multiplication, I want to call a custom function for each pair.

...
...
> print outer([1,2,3],[10,100,1000])

[[ 10 100 1000] [ 20 200 2000] [ 30 300 3000]]

So I want:

[ [f(1,10), f(1,100), f(1,1000)], [f(2,10), f(2, 100), f(2, 1000)], [f(3,10), f(3, 100), f(3,1000)] ]

Maybe something like

In [15]: f = lambda x,y : x*sin(y)

In [16]: a = array([[f(i,j) for i in range(3)] for j in range(3)])

In [17]: a Out[17]: array([[ 0. , 0. , 0. ], [ 0. , 0.84147098, 1.68294197], [ 0. , 0.90929743, 1.81859485]])

I don't know if nested list comprehensions are faster than two nested

loops, but at least they avoid array indexing.

This is just a general comment on recent threads of this type and not directed specifically at Chuck or anyone else.

IMO, the emphasis on avoiding FOR loops at all costs is misplaced. It is often more memory friendly and thus faster to vectorize only the inner

loop

...
and leave outer loops alone. Everything varies with the specific case of course, but trying to avoid FOR loops on principle is not a good strategy.

I agree. My original post asked for solutions without using two nested for loops because I already know the two for loop solution. Besides, I was hoping that some version of 'outer' will take in a function reference and call the function instead of doing multiplifcation.

A specific example would help here. There are ways to deal with certain subclasses of problems that won't necessarily generalize. For example, are you aware of the outer methods on ufuncs (add.outer, substract.outer, etc)? Typical dimensions also matter, since some approaches work well for certain shapes, but are pretty miserable for others. FWIW, I often have very good luck with removing the inner for-loop in favor of vector operations. This tends to be simpler than trying to vectorize everything and often has better performance since it's often more memory friendly. However, it all depends on specifics of the problem. Regards, -tim -- . __ . |-\ . . tim.hochberg@ieee.org

Anne Archibald

8:32 p.m.

On 21/08/07, Timothy Hochberg <tim.hochberg@ieee.org> wrote:

...

This is just a general comment on recent threads of this type and not directed specifically at Chuck or anyone else.

IMO, the emphasis on avoiding FOR loops at all costs is misplaced. It is often more memory friendly and thus faster to vectorize only the inner loop and leave outer loops alone. Everything varies with the specific case of course, but trying to avoid FOR loops on principle is not a good strategy.

Yes and no. From a performance point of view, you are certainly right; vectorizing is definitely not always a speedup. But for me, the main advantage of vectorized operations is generally clarity: C = A*B is clearer and simpler than C = [a*b for (a,b) in zip(A,B)]. When it's not clearer and simpler, I feel no compunction about falling back to list comprehensions and for loops. That said, it would often be nice to have something like map(f,arange(10)) for arrays; the best I've found is vectorize(f)(arange(10)). vectorize, of course, is a good example of my point above: it really just loops, in python IIRC, but conceptually it's extremely handy for doing exactly what the OP wanted. Unfortunately vectorize() does not yield a sufficiently ufunc-like object to support .outer(), as that would be extremely tidy. Anne

Timothy Hochberg

9:14 p.m.

On 8/21/07, Anne Archibald <peridot.faceted@gmail.com> wrote:

...

On 21/08/07, Timothy Hochberg <tim.hochberg@ieee.org> wrote:

...
This is just a general comment on recent threads of this type and not directed specifically at Chuck or anyone else.

IMO, the emphasis on avoiding FOR loops at all costs is misplaced. It is often more memory friendly and thus faster to vectorize only the inner loop and leave outer loops alone. Everything varies with the specific case of course, but trying to avoid FOR loops on principle is not a good strategy.

Yes and no. From a performance point of view, you are certainly right; vectorizing is definitely not always a speedup. But for me, the main advantage of vectorized operations is generally clarity: C = A*B is clearer and simpler than C = [a*b for (a,b) in zip(A,B)]. When it's not clearer and simpler, I feel no compunction about falling back to list comprehensions and for loops.

I always assume that in these cases performance is a driver of the question. It would be straightforward to code an outer equivalent in Python to hide this for anyone who cares. Since no one who asks these questions ever does, I assume they must be primarily motivated by performance. That said, it would often be nice to have something like

...

map(f,arange(10)) for arrays; the best I've found is vectorize(f)(arange(10)).

vectorize, of course, is a good example of my point above: it really just loops, in python IIRC,

I used to think that too, but then I looked at it and I believe it actually grabs the code object out of the function and loops in C. You still have to run the code object at each point though so it's not that fast. It's been a while since I did that looking so I may be totally wrong. but conceptually it's extremely handy for

...

doing exactly what the OP wanted. Unfortunately vectorize() does not yield a sufficiently ufunc-like object to support .outer(), as that would be extremely tidy.

I suppose someone should fix that someday. However, I still think vectorize is an attractive nuisance in the sense that someone has a function that they want to apply to an array and they get sucked into throwing vectorize at the problem. More often than not, vectorize makes things slower than they need to be. If you don't care about performance, that's fine, but I live in fear of code like: def f(a, b): return sin(a*b + a**2) f = vectorize(f) The original function f is a perfectly acceptable vectorized function (assuming one uses numpy.sin), but now it's been replaced by a slower version by passing it through vectorize. To be sure, this isn't always the case; in cases where you have to make choices, things get messier. Still, I'm not convinced that vectorize doesn't hurt more than it helps. -- . __ . |-\ . . tim.hochberg@ieee.org

Robert Kern

10 p.m.

Timothy Hochberg wrote:

...

On 8/21/07, *Anne Archibald* <peridot.faceted@gmail.com <mailto:peridot.faceted@gmail.com>> wrote:

...

but conceptually it's extremely handy for doing exactly what the OP wanted. Unfortunately vectorize() does not yield a sufficiently ufunc-like object to support .outer(), as that would be extremely tidy.

I suppose someone should fix that someday.

Not much to fix. There is already frompyfunc() which does make a real ufunc. However, (and it's a big "however"), those ufuncs only output object arrays. That's why I didn't mention it earlier. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Gael Varoquaux

7:45 a.m.

On Tue, Aug 21, 2007 at 02:14:00PM -0700, Timothy Hochberg wrote:

...

I suppose someone should fix that someday. However, I still think vectorize is an attractive nuisance in the sense that someone has a function that they want to apply to an array and they get sucked into throwing vectorize at the problem. More often than not, vectorize makes things slower than they need to be. If you don't care about performance, that's fine, but I live in fear of code like:

...

def f(a, b): return sin(a*b + a**2) f = vectorize(f)

...

The original function f is a perfectly acceptable vectorized function (assuming one uses numpy.sin), but now it's been replaced by a slower version by passing it through vectorize. To be sure, this isn't always the case; in cases where you have to make choices, things get messier. Still, I'm not convinced that vectorize doesn't hurt more than it helps.

I often have code where I am going to loop over a large amount of nested loops, some thing like: # A function to return the optical field in each point: def optical_field( (x, y, z) ): loop over an array of laser wave-vector return optical field # Evaluate the optical field on a grid to plot it : x, y z = mgrid[-10:10, -10:10, -10:10] field = optical_field( (x, y, z) ) In such a code every single operation could be vectorized, but the problem is that each function assumes the input array to be of a certain dimension: I may be using some code like: r = c_[x, y, z] cross(r, r_o) So implementing loops with arrays is not that convenient, because I have to add dimensions to my arrays, and to make sure that my inner functions are robust to these extra dimensions. Looking at some of my code where I had this kind of problems, I see functions similar to: def delta(r, v, k): return dot(r, transpose(k)) + Gaussian_beam(r) + dot(v, transpose(k)) I am starting to realize that the real problem is that there is no info of what the expected size for the input and output arguments should be. Given such info, the function could resize its input and output arguments. Maybe some clever decorators could be written to address this issue, something like: @inputsize( (3, -1), (3, -1), (3, -1) ) which would reshape every input positional argument to the shape given in the list of shapes, and reshape the output argument to the shape of the first input argument. As I worked around these problems in my code I cannot say whether these decorators would get rid of them (I had not had the idea at the time), I like the idea, and I will try next time I run into these problems. I just wanted to point out that replacing for loops with arrays was not always that simple and that using "vectorize" sometimes was a quick and a dirty way to get things done. Gaël

Travis E. Oliphant

September 2007

1:10 a.m.

Anne Archibald wrote:

...

On 21/08/07, Timothy Hochberg <tim.hochberg@ieee.org> wrote:

...
This is just a general comment on recent threads of this type and not directed specifically at Chuck or anyone else.

IMO, the emphasis on avoiding FOR loops at all costs is misplaced. It is often more memory friendly and thus faster to vectorize only the inner loop and leave outer loops alone. Everything varies with the specific case of course, but trying to avoid FOR loops on principle is not a good strategy.

Yes and no. From a performance point of view, you are certainly right; vectorizing is definitely not always a speedup. But for me, the main advantage of vectorized operations is generally clarity: C = A*B is clearer and simpler than C = [a*b for (a,b) in zip(A,B)]. When it's not clearer and simpler, I feel no compunction about falling back to list comprehensions and for loops.

That said, it would often be nice to have something like map(f,arange(10)) for arrays; the best I've found is vectorize(f)(arange(10)).

vectorize, of course, is a good example of my point above: it really just loops, in python IIRC, but conceptually it's extremely handy for doing exactly what the OP wanted. Unfortunately vectorize() does not yield a sufficiently ufunc-like object to support .outer(), as that would be extremely tidy.

I'm not sure what you mean by sufficiently ufunc-like. In fact, vectorize is a ufunc (it's just an object-based one). Thus, it should produce what you want (as long as you use newaxis so that the broadcasting is done). If you just want it to support the .outer method that could be easily done (as under the covers is a real ufunc). I just over-looked adding these methods to the result of vectorize. The purpose of vectorize is to create a ufunc out of a scalar-based function, so I don't see any problem in giving them the methods of ufuncs as well (as long as the signature is right --- 2 inputs and 1 output). -Travis

Anne Archibald

6:16 a.m.

On 19/09/2007, Travis E. Oliphant <oliphant@enthought.com> wrote:

...

Anne Archibald wrote:

...
vectorize, of course, is a good example of my point above: it really just loops, in python IIRC, but conceptually it's extremely handy for doing exactly what the OP wanted. Unfortunately vectorize() does not yield a sufficiently ufunc-like object to support .outer(), as that would be extremely tidy.

I'm not sure what you mean by sufficiently ufunc-like. In fact, vectorize is a ufunc (it's just an object-based one). Thus, it should produce what you want (as long as you use newaxis so that the broadcasting is done). If you just want it to support the .outer method that could be easily done (as under the covers is a real ufunc).

I just over-looked adding these methods to the result of vectorize. The purpose of vectorize is to create a ufunc out of a scalar-based function, so I don't see any problem in giving them the methods of ufuncs as well (as long as the signature is right --- 2 inputs and 1 output).

Ah. You got it in one: I was missing the methods. It would be handy to have them back, not least because then I could just remember the rule "all binary ufuncs have .outer()". Do ternary ufuncs support outer()? It would presumably just generate a higher-rank array, for example U.outer(arange(10),arange(11),arange(12)) would produce an array of shape (10,11,12)... maybe there aren't any ternary ufuncs yet, apart from the ones that are generated by vectorize(). I suppose ix_ provides an alternative, so that you could have def outer(self,*args): return self(ix_(*args)) Still, I think for conceptual tidiness it would be nice if the ufuncs vectorize() makes supported the methods. Thanks, Anne

6358

Age (days ago)

6389

Last active (days ago)

List overview

Download

13 comments

8 participants

participants (8)

Anne Archibald
Charles R Harris
Christopher Barker
Gael Varoquaux
Geoffrey Zhu
Robert Kern
Timothy Hochberg
Travis E. Oliphant

"Extended" Outer Product

tags

participants (8)