
Having written all this down, I'm half-expecting somebody is going to point out some simple way of doing this with existing builtin functions, but here goes... In the process of 'porting' Konrad Hinsen's Histogram class (from the Scientific package) to N dimensions, I ran across the problem of how to apply a function to an array for a specified set of indices. In this case, the problem is to increment an N-dimensional array for each of a set of indices. In general, it occurred to me that it would be useful to have a function (named 'amap' here) to loop over a specified set of indices, applying a function to a set of arguments taken from a set of argument arrays (similar to the map buitin function in this respect), and filling in the corresponding elements in a target array with the results. For example, here is a bit of the histogram problem: [The original 1D Histogram class uses a clever array-broadcasting trick which creates intermediate rank-2 arrays, feeding in the data in chunks to avoid quadratic behaviour. This is fine with 1D arrays, but in poor taste for the N dimensional case, if it works at all.] #!/usr/bin/env python import Numeric as N def flatten_indices(shape, indices): dims = indices.shape[0] facs = pow(shape, N.arange(dims))[::-1,N.NewAxis] return N.add.reduce(facs*indices) def amap(target, function, indices, *arg_arrays): indices = N.asarray(indices) arg_arrays = map(N.asarray, arg_arrays) if not isinstance(target, N.ArrayType): raise TypeError, "target must be an array" if not target.iscontiguous(): raise ValueError, "target must be contiguous" if not N.alltrue([aa.iscontiguous() for aa in arg_arrays]): raise ValueError, "all arg_arrays must be contiguous" if not len(indices.shape) == 1: raise ValueError, "indices must be rank-1" tf = target.flat for index in indices: args = [arg_array.flat[index] for arg_array in arg_arrays] tf[index] = function(*args) # construct a 2 dimensional histogram given indices, ind, of data points nbins = (5, 4) # 5 bins along one axis, 4 along the other histo = N.zeros(nbins) # empty histogram # these are the indices into the histogram at which data points lie: # histo[1,0] should be incremented twice, histo[2,2] should be incremented # once, etc. ind = N.array([[1, 1, 2, 4], [0, 0, 2, 3]]) flat_ind = flatten_indices(histo.shape, ind) assert flat_ind == N.array([4, 4, 10, 19]) # Now we want to increment histo for each index pair ind[:,n] (in this # 2D case) def increment(x): return x+1 amap(histo, increment, flat_ind, histo) print histo Actually, I just realised that calling it something related to 'map' is rather misleading, because the builtin Python map generates a list of the same length as the sequence passed in; this function doesn't: it 'generates' (fills in, in fact) an array of the same size as the target passed in. Can't think of a better name, though. So: is there a way of doing this (implementing amap) efficiently already? I don't think there is, but I'd be interested to be proved wrong. I admit I don't like the relatively large number of arguments, but I still think it is worthwhile. You could make the *arg_arrays a single argument, and make target an optional argument, at the end of the arguments: def amap(function, indices, arg_arrays, target=None): so that by default, the target array is first created as zeros, then filled in and returned, or perhaps even better: def amap(function, arg_arrays, indices=None, target=None): though the special case where you don't give the indices argument is already achievable without this function, so it could be argued that hiding away the indices argument as a mere optional argument obscures the real point of the function. The last form also has the (debatable) benefit of being similar to the built-in Python map: def map(function, *args): The example call above then looks like this: histo = amap(increment, [histo], indices=flat_ind) Another couple of tweaks might be to allow scalar values to replace the arg_arrays argument, and to allow normal indices rather than flattened ones (the only reason I started out with flat indices was that put() did the same): def amap(function, args, indices=None, target=None): So that the example call looks like: histo = amap(increment, [histo], indices=ind) or histo = amap(Numeric.add, [histo, 1], ind) I like those last two. Even if there is already an efficient way of doing this with existing Numeric functions, I think this is a natural way of thinking about some operations on arrays -- this one, at least -- and would be useful to have in Numeric. For real efficiency, I suppose the right thing to do would be to have a map method on ufuncs, as is the case for reduce: histo = Numeric.add.map([histo, 1], ind) but it would be more work to implement than just a standard function, and less general. I've made a first attempt at the function, in C (will upload diff against 20.2.0 if anyone so requests). It only accepts one arg array, probably doesn't really do exactly the same as the Python function above (for better or worse), and it's a mess of a function, since I rarely write anything in C. It is also incorrect in at least some ways (most obviously, it only deals with Python floats / C doubles / Numpy PyArray_DOUBLEs), though there are some tests (which pass). I'm wasn't sure how fast it would be, since after all it's only using ordinary Python functions, not ufuncs, but my crude tests appeared to show an improvement of about a factor of two over an optimised version of the Python function above. The slow bit seems to be constructing the argument list / passing arg list to function, so I removed that to make the comparison, along with an unecessary attribute access in the loop. Presumably the general (> 1 arg_array) version would show a bigger improvement over the corresponding Python version. By the way, does anyone understand what the reduceat ufunc method is supposed to do? Seems to be the Beale cipher of Numeric -- having studied its output to see if it was the function I was looking for, I can find no sense in it and kind of suspect that it was constructed only to keep us puzzling. ;) Is Jim Hugunin not available for comment on the matter, or is this somebody else's dirty work? John