[AstroPy] astropy.table.Table groups: aggregate over & combine multiple columns?

Wed Jan 20 07:47:29 EST 2016

Hi Evert,

Great question, I'm also really interested to hear the answer to this. I
always use the built-in table aggregation functions when possible, but
sometimes end up writing my own Numpy calculation for more complicated
examples (using np.unique and/or np.searchsorted).

For computing a group-wise weighted average, there is a way you can recast
your problem that allows you to use the existing astropy built-in: just
create a new column that is the product of your second and third columns, '
and then use aggregate(average)  in the normal way on this new column.

So I *think* that gives an answer to the specific example you gave, but it
dodges the real question, which I am also interested to hear the experts
weigh in on.

Andrew

On Tue, Jan 19, 2016 at 11:52 PM, Evert Rol <evert.rol at gmail.com> wrote:

> Is there a way in an astropy table to run the TableGroups aggregate
> function on multiple columns at once?
>
> In this specific case, I'd like to group by names in one column, and then
> average the second column weighted by values in the third column.
> An example would be:
>
>   from astropy.table import Table
>
>   def average(col):
>       # Manipulate multiple columns at once?
>       return col.mean()
>
>   t = Table([['a', 'a', 'a', 'b', 'b', 'c'],
>              [1, 2, 3, 4, 5, 6],
>              [2, 2, 1, 2, 1, 1]],
>             names=('name', 'value', 'weight'))
>   group = t.group_by('name')
>   result = group.groups.aggregate(average)
>   print(result)
>
> which gives
>
>   name value     weight
>   ---- ----- -------------
>      a   2.0 1.66666666667
>      b   4.5           1.5
>      c   6.0           1.0
>
> which is not what I want.
>
>
> In Pandas, this can be done with apply() on a groupby object, since that
> passes the relevant subsection the dataframe as input to the function.
> So I can write:
>
>   def average_pd(df):
>       weight = df['weight']
>       total = weight.sum()
>       df['value'] *= weight / total
>       df['value'] = df['value'].sum()
>       df['weight'] = total  # for info; not necessary
>       return df.iloc[0]  # ignore other rows: they are the same anyway
>
>   df = t.to_pandas()
>   result = df.groupby('name')[['value', 'weight']].apply(average_pd)
>   print(result)
>
> which gives:
>
>          value  weight
> name
> a     1.800000       5
> b     4.333333       3
> c     6.000000       1
>
> and 'value' consists of weighted averages.
>
> (code also on
> https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_evertrol_12955a5d98edf055a2f4&d=AwICAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=AHkQ8HPUDwzl0x62ybAnwN_OEebPRGDtcjUPBcnLYw4&m=-BKjmG3hTRkdOfmFIKI5e3myB8cKiFHeJbTAhi3Zg5U&s=ix-QzeHis8ltaMFyVo3QvHpnQYri_s75MpTGsufcbqM&e=
> )
>
>
> Perhaps I overlooked some documentation, but I can't find if this can be
> done in astropy.table. Or do I just need to approach this differently?
> Alternatively, should I convert & stick to Pandas for this type of
> functionality?
>
>
>   Evert
>
>
>
>
>
>
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.scipy.org_mailman_listinfo_astropy&d=AwICAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=AHkQ8HPUDwzl0x62ybAnwN_OEebPRGDtcjUPBcnLYw4&m=-BKjmG3hTRkdOfmFIKI5e3myB8cKiFHeJbTAhi3Zg5U&s=XZ616g8wR7LBzFglTQ8J2F-bDe6rE-HuXIrePKntv6w&e=
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/astropy/attachments/20160120/5b3d9960/attachment.html>