My head is about to explode. I have an M by N array of floats. Associated with the columns are character labels ['a','b','b','c','d','e','e','e'] note: already sorted so duplicates are contiguous I want to replace the 2 'b' columns with the sum of the 2 columns. Similarly, replace the 3 'e' columns with the sum of the 3 'e' columns. The resulting array still has M rows but less than N columns. Anyone? Could be any harder than Sudoku. Mathew
On 8/29/06, Mathew Yeates
I have an M by N array of floats. Associated with the columns are character labels ['a','b','b','c','d','e','e','e'] note: already sorted so duplicates are contiguous
I want to replace the 2 'b' columns with the sum of the 2 columns. Similarly, replace the 3 'e' columns with the sum of the 3 'e' columns.
Make a cumsum of the array. Find the index of the last 'a', last 'b', etc and make the reduced array from that. Then take the diff of the columns. I know that's vague, but so is my understanding of python/numpy. Or even more vague: make a function that does what you want.
On 8/29/06, Keith Goodman
On 8/29/06, Mathew Yeates
wrote: I have an M by N array of floats. Associated with the columns are character labels ['a','b','b','c','d','e','e','e'] note: already sorted so duplicates are contiguous
I want to replace the 2 'b' columns with the sum of the 2 columns. Similarly, replace the 3 'e' columns with the sum of the 3 'e' columns.
Make a cumsum of the array. Find the index of the last 'a', last 'b', etc and make the reduced array from that. Then take the diff of the columns.
I know that's vague, but so is my understanding of python/numpy.
Or even more vague: make a function that does what you want.
Or you could use searchsorted on the labels to get a sequence of ranges. What you have is a sort of binning applied to columns instead of values in a vector. Or, if the overhead isn't to much, use a dictionary of with (keys: array) entries. Index thru the columns adding keys, when the key is new insert a column copy, when it is already present add the new column to the old one. Chuck
Mathew Yeates schrieb:
My head is about to explode.
I have an M by N array of floats. Associated with the columns are character labels ['a','b','b','c','d','e','e','e'] note: already sorted so duplicates are contiguous
I want to replace the 2 'b' columns with the sum of the 2 columns. Similarly, replace the 3 'e' columns with the sum of the 3 'e' columns.
The resulting array still has M rows but less than N columns. Anyone? Could be any harder than Sudoku.
Hi, I don't have time for this ;-) , but I learnt something useful along the way... import numpy as n m = n.ones([2,6]) a = ['b', 'c', 'c', 'd', 'd', 'd'] startindices = set([a.index(x) for x in a]) out = n.empty([m.shape[0], 0]) for i in startindices: temp = n.mat(m[:, i : i + a.count(a[i])]).sum(axis = 1) out = n.hstack([out, temp]) print out Not sure if axis = 1 is needed, but until the defaults have settled a bit it can't hurt. You need python 2.4 for the built-in <set>, and <out> will be a numpy matrix, use <asarray> if you don't like that. But here it's really nice to work with matrices, because otherwise .sum() will give you a 1-d array sometimes, and that will suddenly look like a row to <hstack> (instead of a nice column vector) and wouldn't work -- that's why matrices are so great and everybody should be using them ;-) hth, sven
On 8/30/06, Sven Schreiber
Mathew Yeates schrieb: will be a numpy matrix, use <asarray> if you don't like that. But here it's really nice to work with matrices, because otherwise .sum() will give you a 1-d array sometimes, and that will suddenly look like a row to <hstack> (instead of a nice column vector) and wouldn't work -- that's why matrices are so great and everybody should be using them ;-)
column_stack would work perfectly in place of hstack there if it only didn't have the silly behavior of transposing arguments that already are 2-d. For reminders, here's the replacement implementation of column_stack I proposed on July 21: def column_stack(tup): def transpose_1d(array): if array.ndim<2: return _nx.transpose(atleast_2d(array)) else: return array arrays = map(transpose_1d,map(atleast_1d,tup)) return _nx.concatenate(arrays,1) This was in a big ticket I submitted about overhauling r_,c_,etc, which was largely ignored. Maybe I should resubmit this by itself... --bb
On Tue, Aug 29, 2006 at 03:46:45PM -0700, Mathew Yeates wrote:
My head is about to explode.
I have an M by N array of floats. Associated with the columns are character labels ['a','b','b','c','d','e','e','e'] note: already sorted so duplicates are contiguous
I want to replace the 2 'b' columns with the sum of the 2 columns. Similarly, replace the 3 'e' columns with the sum of the 3 'e' columns.
The resulting array still has M rows but less than N columns. Anyone? Could be any harder than Sudoku.
I attach one possible solution (allowing for the same column name occurring in different places, i.e. ['a','b','b','a']). I'd be glad for any suggestions on how to clean up the code. Regards Stéfan
participants (6)
-
Bill Baxter
-
Charles R Harris
-
Keith Goodman
-
Mathew Yeates
-
Stefan van der Walt
-
Sven Schreiber