[Matrix-SIG] Proposal: Array of arrays and non-copying slices
Edward C. Jones
edcjones@erols.com
Sat, 19 Dec 1998 19:52:17 -0500
I have always liked the "reduceat" method. It is powerful and
useful. The
NumPy documentation is wrong when it says "This is a weird
function, and
most people should just ignore it." This method essentially
divides an array
into blocks (along one axis only) and calls "reduce" on each one.
Operating on
blocks like this is very common in spreadsheets, numerical
analysis, image
processing, and computer graphics.
I feel that NumPy can be made more useful and "reduceat" more
intuitive by
adding:
1. A non-copying slice operation in addition to the current
copying slice
operation.
With this, one can filter part of an image or average part of a
spreadsheet
without expensive copying operations. The non-copying slice can
also be used to
divide an array into an array of blocks.
What is a good notation for this? There would be a flame war if I
proposed
using {} instead of [] for non-copying arrays. So perhaps, if
a = [0,1,2,3,4,5], let a.sub[1:4] == [1,2,3].
2. The ability to treat an array of arrays as a single array in
either of two
senses as explained below.
Consider the arrays:
a = [[0 1]
[2 3]]
b = [[10 11]
[12 13]]
c = [[20 21]
[22 23]]
d = [[30 31]
[32 33]]
abcd = [[ 0 1 10 11] (1)
[ 2 3 12 13]
[20 21 30 31]
[22 23 32 33]]
Suppose I start with the small arrays a, b, c, and d. Currently I
can create
an array of arrays by
A = Numeric.array([[0,0],[0,0]], 'O') # Entries can be any Python
object.
A[0][0] = a
A[0][1] = b
A[1][0] = c
A[1][1] = d
This can viewed as adding dimensions. Indeed, "print A" gives
[[[[0 1] [2 3]] (2)
[[10 11] [12 13]] ]
[[[20 21] [22 23]]
[[30 31] [32 33]] ]]
which has shape (2,2,2,2) (but A.shape actually is (2,2)).
But it can be also visualized as creating the array abcd of shape
(4,4).
This second interpretation is would be useful to many users. For
example, in
numerical analysis arrays are often divided into blocks in order
to prevent
thrashing or as part of an fft or other algorithm. Moreover, in
computer
vision, image processing and computer graphics pyramid or
quadtree methods
are widely used.
One can also start with an actual array of shape (2,2,2,2) or
(4,4) and
think of it as an array of arrays. One could then start with a
single
actual array and operate on it as if it were an array of arrays.
Currently, one character strings are used for typecodes in
defining arrays.
Add the new typecodes 'A' and 'B' where 'A' defines an array of
arrays
interpreted as adding dimensions and 'B' defines an array of
arrays
interpreted as a tiling. So AA = Numeric.array([[a,b],[c,d]],
'A') would
print as (2) above while BB = Numeric.array([[a,b],[c,d]], 'B')
would print
as (1) above. Subscripts would behave according to
interpretation:
AA[1,0,1,0] == 22 and BB[1,2] == 12.
To divide an array into an array of blocks define an array of
non-copying
subarrays and call Numeric.array. This is a painful way to do a
common
operation. I suggest using "indices" as used in the "reduceat"
function.
To divide an 8 by 8 array, A88, into 4 by 4 blocks one might use
something
like Numeric.array.blocks(A88, ((0,4), (0,4)) ) or
A.blocks( ((0,4), (0,4)) ).
Now "reduceat" becomes intuitive: "reduceat" called for an array
of arrays
does a "reduce" on each array in the array of arrays.