[Numpy-discussion] Grouping & Collapsing multi-dimensional data

Mike Biglan mike at biglan.org
Tue Mar 20 13:44:58 EDT 2007


I might be using the wrong terminology but I'm trying to take a 2d
array where each row has a department object and then 36 floats after
it, eg: [dept1, 3,6,7...]
With SQL or R i know how to collapse a simple 2d data structure like
this.  For example in SQL:
select dept, stddev(field1)... from tbl_x group by dept
But I want to end up with either a collapsed table grouped on dept
where each element is some summary statistic, or better yet where each
value is a dictionary of name=summary statistic and value = value of
that statistic.

Just to sort and perform this by group has been problematic because it
gives me an error when sorting by an object if that object has __cmp__
method.  I can do an index-based sort after using a field in the
object -- but then must figure out the groups of rows manually.  There
must be easier methods to collapse on a dimension, grouping by one or
more elements and applying arbitrary functions.  Can this be done with
numpy?  Should i work with scipy?

thanks -

mike biglan

ps: my example here is 2d but i'm hoping that this functionality would
work for any-d




More information about the NumPy-Discussion mailing list