[AstroPy] astropy.table.Table groups: aggregate over & combine multiple columns?
andrew.hearin at yale.edu
Wed Jan 20 07:47:29 EST 2016
Great question, I'm also really interested to hear the answer to this. I
always use the built-in table aggregation functions when possible, but
sometimes end up writing my own Numpy calculation for more complicated
examples (using np.unique and/or np.searchsorted).
For computing a group-wise weighted average, there is a way you can recast
your problem that allows you to use the existing astropy built-in: just
create a new column that is the product of your second and third columns, '
and then use aggregate(average) in the normal way on this new column.
So I *think* that gives an answer to the specific example you gave, but it
dodges the real question, which I am also interested to hear the experts
weigh in on.
On Tue, Jan 19, 2016 at 11:52 PM, Evert Rol <evert.rol at gmail.com> wrote:
> Is there a way in an astropy table to run the TableGroups aggregate
> function on multiple columns at once?
> In this specific case, I'd like to group by names in one column, and then
> average the second column weighted by values in the third column.
> An example would be:
> from astropy.table import Table
> def average(col):
> # Manipulate multiple columns at once?
> return col.mean()
> t = Table([['a', 'a', 'a', 'b', 'b', 'c'],
> [1, 2, 3, 4, 5, 6],
> [2, 2, 1, 2, 1, 1]],
> names=('name', 'value', 'weight'))
> group = t.group_by('name')
> result = group.groups.aggregate(average)
> which gives
> name value weight
> ---- ----- -------------
> a 2.0 1.66666666667
> b 4.5 1.5
> c 6.0 1.0
> which is not what I want.
> In Pandas, this can be done with apply() on a groupby object, since that
> passes the relevant subsection the dataframe as input to the function.
> So I can write:
> def average_pd(df):
> weight = df['weight']
> total = weight.sum()
> df['value'] *= weight / total
> df['value'] = df['value'].sum()
> df['weight'] = total # for info; not necessary
> return df.iloc # ignore other rows: they are the same anyway
> df = t.to_pandas()
> result = df.groupby('name')[['value', 'weight']].apply(average_pd)
> which gives:
> value weight
> a 1.800000 5
> b 4.333333 3
> c 6.000000 1
> and 'value' consists of weighted averages.
> (code also on
> Perhaps I overlooked some documentation, but I can't find if this can be
> done in astropy.table. Or do I just need to approach this differently?
> Alternatively, should I convert & stick to Pandas for this type of
> AstroPy mailing list
> AstroPy at scipy.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the AstroPy