[AstroPy] astropy.table.Table groups: aggregate over & combine multiple columns?
Andrew Hearin
andrew.hearin at yale.edu
Wed Jan 20 07:47:29 EST 2016
Hi Evert,
Great question, I'm also really interested to hear the answer to this. I
always use the built-in table aggregation functions when possible, but
sometimes end up writing my own Numpy calculation for more complicated
examples (using np.unique and/or np.searchsorted).
For computing a group-wise weighted average, there is a way you can recast
your problem that allows you to use the existing astropy built-in: just
create a new column that is the product of your second and third columns, '
and then use aggregate(average) in the normal way on this new column.
So I *think* that gives an answer to the specific example you gave, but it
dodges the real question, which I am also interested to hear the experts
weigh in on.
Andrew
On Tue, Jan 19, 2016 at 11:52 PM, Evert Rol <evert.rol at gmail.com> wrote:
> Is there a way in an astropy table to run the TableGroups aggregate
> function on multiple columns at once?
>
> In this specific case, I'd like to group by names in one column, and then
> average the second column weighted by values in the third column.
> An example would be:
>
> from astropy.table import Table
>
> def average(col):
> # Manipulate multiple columns at once?
> return col.mean()
>
> t = Table([['a', 'a', 'a', 'b', 'b', 'c'],
> [1, 2, 3, 4, 5, 6],
> [2, 2, 1, 2, 1, 1]],
> names=('name', 'value', 'weight'))
> group = t.group_by('name')
> result = group.groups.aggregate(average)
> print(result)
>
> which gives
>
> name value weight
> ---- ----- -------------
> a 2.0 1.66666666667
> b 4.5 1.5
> c 6.0 1.0
>
> which is not what I want.
>
>
> In Pandas, this can be done with apply() on a groupby object, since that
> passes the relevant subsection the dataframe as input to the function.
> So I can write:
>
> def average_pd(df):
> weight = df['weight']
> total = weight.sum()
> df['value'] *= weight / total
> df['value'] = df['value'].sum()
> df['weight'] = total # for info; not necessary
> return df.iloc[0] # ignore other rows: they are the same anyway
>
> df = t.to_pandas()
> result = df.groupby('name')[['value', 'weight']].apply(average_pd)
> print(result)
>
> which gives:
>
> value weight
> name
> a 1.800000 5
> b 4.333333 3
> c 6.000000 1
>
> and 'value' consists of weighted averages.
>
> (code also on
> https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_evertrol_12955a5d98edf055a2f4&d=AwICAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=AHkQ8HPUDwzl0x62ybAnwN_OEebPRGDtcjUPBcnLYw4&m=-BKjmG3hTRkdOfmFIKI5e3myB8cKiFHeJbTAhi3Zg5U&s=ix-QzeHis8ltaMFyVo3QvHpnQYri_s75MpTGsufcbqM&e=
> )
>
>
> Perhaps I overlooked some documentation, but I can't find if this can be
> done in astropy.table. Or do I just need to approach this differently?
> Alternatively, should I convert & stick to Pandas for this type of
> functionality?
>
>
> Evert
>
>
>
>
>
>
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.scipy.org_mailman_listinfo_astropy&d=AwICAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=AHkQ8HPUDwzl0x62ybAnwN_OEebPRGDtcjUPBcnLYw4&m=-BKjmG3hTRkdOfmFIKI5e3myB8cKiFHeJbTAhi3Zg5U&s=XZ616g8wR7LBzFglTQ8J2F-bDe6rE-HuXIrePKntv6w&e=
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/astropy/attachments/20160120/5b3d9960/attachment.html>
More information about the AstroPy
mailing list