calculating the mean for each factor (like tapply in R)
Hi there, I've just moved from R to IPython and wondered if there was a good way of finding the means and/or variance of values in a dataframe given a factor e.g.: if df = x experiment 10 1 13 1 12 1 3 2 4 2 6 2 33 3 44 3 55 3 in tapply you would do: tapply(df$x, list(df$experiment), mean) tapply(df$x, list(df$experiment), var) I guess I can always loop through the array for each experiment type, but thought that this is the kind of functionality that would be included in a core library. Many thanks, Ben
Hi there,
I've just moved from R to IPython and wondered if there was a good way of finding the means and/or variance of values in a dataframe given a factor
e.g.: if df = x experiment 10 1 13 1 12 1 3 2 4 2 6 2 33 3 44 3 55 3
in tapply you would do:
tapply(df$x, list(df$experiment), mean) tapply(df$x, list(df$experiment), var)
I guess I can always loop through the array for each experiment type, but thought that this is the kind of functionality that would be included in a core library.
Pandas (http://pandas.pydata.org/) seems to be what you're looking for. It has a DataFrame class which allows grouping of data. Cheers, Andreas.
Hi, It is pretty much the same as looping, but you could do the following In [1]: import numpy as np In [2]: exps = np.array([10,13,12,3,4,6,33,44,55]) In [3]: x = np.array([10,13,12,3,4,6,33,44,55]) In [4]: exps = np.array([1,1,1,2,2,2,3,3,3]) z = [np.mean(x[exps == i]) for i in np.unique( exps )] -- Oleksandr (Sasha) Huziy 2012/8/1 Andreas Hilboll <lists@hilboll.de>
Hi there,
I've just moved from R to IPython and wondered if there was a good way of finding the means and/or variance of values in a dataframe given a factor
e.g.: if df = x experiment 10 1 13 1 12 1 3 2 4 2 6 2 33 3 44 3 55 3
in tapply you would do:
tapply(df$x, list(df$experiment), mean) tapply(df$x, list(df$experiment), var)
I guess I can always loop through the array for each experiment type, but thought that this is the kind of functionality that would be included in a core library.
Pandas (http://pandas.pydata.org/) seems to be what you're looking for. It has a DataFrame class which allows grouping of data.
Cheers, Andreas.
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
On Wed, Aug 1, 2012 at 9:35 AM, Oleksandr Huziy <guziy.sasha@gmail.com> wrote:
Hi,
It is pretty much the same as looping, but you could do the following
In [1]: import numpy as np
In [2]: exps = np.array([10,13,12,3,4,6,33,44,55])
In [3]: x = np.array([10,13,12,3,4,6,33,44,55])
In [4]: exps = np.array([1,1,1,2,2,2,3,3,3])
z = [np.mean(x[exps == i]) for i in np.unique( exps )]
-- Oleksandr (Sasha) Huziy
2012/8/1 Andreas Hilboll <lists@hilboll.de>
Hi there,
I've just moved from R to IPython and wondered if there was a good way of finding the means and/or variance of values in a dataframe given a factor
e.g.: if df = x experiment 10 1 13 1 12 1 3 2 4 2 6 2 33 3 44 3 55 3
in tapply you would do:
tapply(df$x, list(df$experiment), mean) tapply(df$x, list(df$experiment), var)
I guess I can always loop through the array for each experiment type, but thought that this is the kind of functionality that would be included in a core library.
Pandas (http://pandas.pydata.org/) seems to be what you're looking for. It has a DataFrame class which allows grouping of data.
Cheers, Andreas.
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
For the #lazyweb, here is what this looks like in pandas: In [24]: df Out[24]: x experiment 0 10 1 1 13 1 2 12 1 3 3 2 4 4 2 5 6 2 6 33 3 7 44 3 8 55 3 In [25]: df.groupby('experiment').x.mean() Out[25]: experiment 1 11.666667 2 4.333333 3 44.000000 Name: x In [26]: df.groupby('experiment').x.var() Out[26]: experiment 1 2.333333 2 2.333333 3 121.000000 Name: x or if you want to be fancy: In [27]: df.groupby('experiment').x.agg(['mean', 'var']) Out[27]: mean var experiment 1 11.666667 2.333333 2 4.333333 2.333333 3 44.000000 121.000000 There are good reasons to use pandas over a DIY approach with NumPy array operations; notably I use smart algorithms so that the runtime scales linearly with the side of the data instead of quadratically. - Wes
participants (4)
-
Andreas Hilboll -
Ben Temperton -
Oleksandr Huziy -
Wes McKinney