amap: new function / ufunc method?

Having written all this down, I'm half-expecting somebody is going to point out some simple way of doing this with existing builtin functions, but here goes... In the process of 'porting' Konrad Hinsen's Histogram class (from the Scientific package) to N dimensions, I ran across the problem of how to apply a function to an array for a specified set of indices. In this case, the problem is to increment an N-dimensional array for each of a set of indices. In general, it occurred to me that it would be useful to have a function (named 'amap' here) to loop over a specified set of indices, applying a function to a set of arguments taken from a set of argument arrays (similar to the map buitin function in this respect), and filling in the corresponding elements in a target array with the results. For example, here is a bit of the histogram problem: [The original 1D Histogram class uses a clever array-broadcasting trick which creates intermediate rank-2 arrays, feeding in the data in chunks to avoid quadratic behaviour. This is fine with 1D arrays, but in poor taste for the N dimensional case, if it works at all.] #!/usr/bin/env python import Numeric as N def flatten_indices(shape, indices): dims = indices.shape[0] facs = pow(shape, N.arange(dims))[::-1,N.NewAxis] return N.add.reduce(facs*indices) def amap(target, function, indices, *arg_arrays): indices = N.asarray(indices) arg_arrays = map(N.asarray, arg_arrays) if not isinstance(target, N.ArrayType): raise TypeError, "target must be an array" if not target.iscontiguous(): raise ValueError, "target must be contiguous" if not N.alltrue([aa.iscontiguous() for aa in arg_arrays]): raise ValueError, "all arg_arrays must be contiguous" if not len(indices.shape) == 1: raise ValueError, "indices must be rank-1" tf = target.flat for index in indices: args = [arg_array.flat[index] for arg_array in arg_arrays] tf[index] = function(*args) # construct a 2 dimensional histogram given indices, ind, of data points nbins = (5, 4) # 5 bins along one axis, 4 along the other histo = N.zeros(nbins) # empty histogram # these are the indices into the histogram at which data points lie: # histo[1,0] should be incremented twice, histo[2,2] should be incremented # once, etc. ind = N.array([[1, 1, 2, 4], [0, 0, 2, 3]]) flat_ind = flatten_indices(histo.shape, ind) assert flat_ind == N.array([4, 4, 10, 19]) # Now we want to increment histo for each index pair ind[:,n] (in this # 2D case) def increment(x): return x+1 amap(histo, increment, flat_ind, histo) print histo Actually, I just realised that calling it something related to 'map' is rather misleading, because the builtin Python map generates a list of the same length as the sequence passed in; this function doesn't: it 'generates' (fills in, in fact) an array of the same size as the target passed in. Can't think of a better name, though. So: is there a way of doing this (implementing amap) efficiently already? I don't think there is, but I'd be interested to be proved wrong. I admit I don't like the relatively large number of arguments, but I still think it is worthwhile. You could make the *arg_arrays a single argument, and make target an optional argument, at the end of the arguments: def amap(function, indices, arg_arrays, target=None): so that by default, the target array is first created as zeros, then filled in and returned, or perhaps even better: def amap(function, arg_arrays, indices=None, target=None): though the special case where you don't give the indices argument is already achievable without this function, so it could be argued that hiding away the indices argument as a mere optional argument obscures the real point of the function. The last form also has the (debatable) benefit of being similar to the built-in Python map: def map(function, *args): The example call above then looks like this: histo = amap(increment, [histo], indices=flat_ind) Another couple of tweaks might be to allow scalar values to replace the arg_arrays argument, and to allow normal indices rather than flattened ones (the only reason I started out with flat indices was that put() did the same): def amap(function, args, indices=None, target=None): So that the example call looks like: histo = amap(increment, [histo], indices=ind) or histo = amap(Numeric.add, [histo, 1], ind) I like those last two. Even if there is already an efficient way of doing this with existing Numeric functions, I think this is a natural way of thinking about some operations on arrays -- this one, at least -- and would be useful to have in Numeric. For real efficiency, I suppose the right thing to do would be to have a map method on ufuncs, as is the case for reduce: histo = Numeric.add.map([histo, 1], ind) but it would be more work to implement than just a standard function, and less general. I've made a first attempt at the function, in C (will upload diff against 20.2.0 if anyone so requests). It only accepts one arg array, probably doesn't really do exactly the same as the Python function above (for better or worse), and it's a mess of a function, since I rarely write anything in C. It is also incorrect in at least some ways (most obviously, it only deals with Python floats / C doubles / Numpy PyArray_DOUBLEs), though there are some tests (which pass). I'm wasn't sure how fast it would be, since after all it's only using ordinary Python functions, not ufuncs, but my crude tests appeared to show an improvement of about a factor of two over an optimised version of the Python function above. The slow bit seems to be constructing the argument list / passing arg list to function, so I removed that to make the comparison, along with an unecessary attribute access in the loop. Presumably the general (> 1 arg_array) version would show a bigger improvement over the corresponding Python version. By the way, does anyone understand what the reduceat ufunc method is supposed to do? Seems to be the Beale cipher of Numeric -- having studied its output to see if it was the function I was looking for, I can find no sense in it and kind of suspect that it was constructed only to keep us puzzling. ;) Is Jim Hugunin not available for comment on the matter, or is this somebody else's dirty work? John

Hy, this is my first attempt to give some back to the python/numerical-python community. Here (http://kdfio.sourceforge.net) you can find an early release of my I/O routine for reading and writing kdf file as in khoros environment. Khoros (from http://www.khoral.com) is an interactive program for digital image processing, we are using in our lab (university of heidelberg). All the time i've found hard to use such environment because it lacks some flexibility i need. Now i've been using numerical python for some time and i would other will use it as standard base for numerical computation, but for my purpose it was lacking I/O with such an environment (khoros). Khoros is not free, but anyway if you are using it i hope you will like these routines to read and write their files. These are in an early stage, so expect a lot of bugs, missing feature and so on: it's only my first project so feedbacks, suggestions and complains are welcome. Ops, i've forgotten: i'm personally using this rotines for my PHD, so they should be reasonably stable. The license should be BSD? GPL? LGPL? any comment is welcome, because i never understood these problem (it is my first project, didn't i tell you?). best regards to all GREAT folks in this group, antonio cavallo ps. you can download khoros if you are a student from: http://www.khoral.com/khoros/kp2001_student

cavallo@kip.uni-heidelberg.de <cavallo@kip.uni-heidelberg.de>: ...
If you don't specifically want to follow the philosophy of the GPL, I'd suggest that you go with some BSD variant (e.g. MIT, which is even simpler). That way, you place less restrictions on its use/distributions, and more people can/will use it. (YMMV, of course.) -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org

If I understand what you are trying to do, there is a function called arraymap in the SciPy package (special module) (it was in the Numeric source for a short time but I think we decided to take it out --- I can't remember why). I think that using this function and a combination of take and put can do what you describe. It is available as scipy.special.arraymap Best regards, Travis Oliphant

On Sun, 27 Jan 2002, Travis Oliphant wrote: [...]
I don't see how, though I may very well be missing something. To restate the problem very quickly (though the Python function I posted is much clearer -- it's only two or three lines of actual code): for every element in a list of array indices, the element in an output array at that index needs to be incremented; the tricky part is that this needs to happen once for *every time the index appears in the list*. Eg., in a 1D example, [0,1,1,1,4,5] might give you [1,3,0,0,1,1] (again, see my previous post for an actual tested example) I don't see how you can do this with arraymap, take and put. John

Hy, this is my first attempt to give some back to the python/numerical-python community. Here (http://kdfio.sourceforge.net) you can find an early release of my I/O routine for reading and writing kdf file as in khoros environment. Khoros (from http://www.khoral.com) is an interactive program for digital image processing, we are using in our lab (university of heidelberg). All the time i've found hard to use such environment because it lacks some flexibility i need. Now i've been using numerical python for some time and i would other will use it as standard base for numerical computation, but for my purpose it was lacking I/O with such an environment (khoros). Khoros is not free, but anyway if you are using it i hope you will like these routines to read and write their files. These are in an early stage, so expect a lot of bugs, missing feature and so on: it's only my first project so feedbacks, suggestions and complains are welcome. Ops, i've forgotten: i'm personally using this rotines for my PHD, so they should be reasonably stable. The license should be BSD? GPL? LGPL? any comment is welcome, because i never understood these problem (it is my first project, didn't i tell you?). best regards to all GREAT folks in this group, antonio cavallo ps. you can download khoros if you are a student from: http://www.khoral.com/khoros/kp2001_student

cavallo@kip.uni-heidelberg.de <cavallo@kip.uni-heidelberg.de>: ...
If you don't specifically want to follow the philosophy of the GPL, I'd suggest that you go with some BSD variant (e.g. MIT, which is even simpler). That way, you place less restrictions on its use/distributions, and more people can/will use it. (YMMV, of course.) -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org

If I understand what you are trying to do, there is a function called arraymap in the SciPy package (special module) (it was in the Numeric source for a short time but I think we decided to take it out --- I can't remember why). I think that using this function and a combination of take and put can do what you describe. It is available as scipy.special.arraymap Best regards, Travis Oliphant

On Sun, 27 Jan 2002, Travis Oliphant wrote: [...]
I don't see how, though I may very well be missing something. To restate the problem very quickly (though the Python function I posted is much clearer -- it's only two or three lines of actual code): for every element in a list of array indices, the element in an output array at that index needs to be incremented; the tricky part is that this needs to happen once for *every time the index appears in the list*. Eg., in a 1D example, [0,1,1,1,4,5] might give you [1,3,0,0,1,1] (again, see my previous post for an actual tested example) I don't see how you can do this with arraymap, take and put. John
participants (4)
-
cavallo@kip.uni-heidelberg.de
-
John J. Lee
-
Magnus Lie Hetland
-
Travis Oliphant