[Numpy-discussion] Broadcasting rules (Ticket 76).

Thu Apr 27 00:00:05 EDT 2006

As Sasha quite clearly pointed out, when you do aggregation, you really 
do want to reduce the dimensionality of your data. IN fact, that's 
something that always bit me with MATLAB. If I had a matrix that 
happened to have a dimension of 1, MATLAB would interpret it as a 
vector. I ended up writing functions like "SumColumns" that would check 
if it was a single row vector before calling sum, so that I wouldn't 
suddenly get a scaler result if a matrix happened to have on row.

Once you reduce dimensionality with aggregating functions, I can see how 
it would be natural to want to use broadcasting to to merge the reduced 
data and full data. However, I can't see how you could do that cleanly.

How is the code to know whether a rank-1 array represents a column or 
row when multiplied with a rank-2 array? There is simply no way to know, 
in general. I suppose we could define a convention, like:

"rank-1 arrays will be interpreted as row vectors for broadcasting."

etc. for higher dimensions.

However, I've found that even in my code, I don't find one convention 
always makes the most sense for all applications, so I'm just as happy 
to make it clear with a lot of calls like:

v.shape = (-1, 1)

NOTE:

It appears that numpy does, in fact, use such a convention:

 >>> v = N.arange(5)
 >>> m = N.ones((5,5))
 >>> v * m
array([[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]])
 >>> v.shape = (-1,1)
 >>> v * m
array([[0, 0, 0, 0, 0],
        [1, 1, 1, 1, 1],
        [2, 2, 2, 2, 2],
        [3, 3, 3, 3, 3],
        [4, 4, 4, 4, 4]])

So what's the disagreement about?

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov