[Numpy-discussion] How to implement a 'pivot table?'

Fri Aug 3 01:57:06 EDT 2007

What is ugly about the module? I like it!

What do you mean about recarray's? Do you think they are they not
appropriate for this type of thing?

When i get some time i'll run some tests versus SAS for the same
operations and do a speed comparison.

Question: Would there be an easy way to merge the summary stats back
into the recarray?

Best,

Vincent

On Aug 1, 11:22 pm, Travis Vaught <tra... at enthought.com> wrote:
> Greetings,
>
> Speaking of brute force... I've attached a rather ugly module that
> let's you do things with a pretty simple interface (session shown
> below).  I haven't fully tested the performance, but a million
> records with 5 fields takes about 11 seconds on my Mac to do a
> 'mean'.  I'm not sure what your performance considerations are, but
> this may be useful.  Record arrays are really nice if they make sense
> for your data.
>
> Travis
>
> (from an ipython command prompt)
>
> In [1]: import testpivot as p
>
> In [2]: a = p.sample_data()
>
> In [3]: a
> Out[3]:
> recarray([('ACorp', 'Region 1', 'Q1', 20000.0),
>         ('ACorp', 'Region 1', 'Q2', 22000.0),
>         ('ACorp', 'Region 1', 'Q3', 21000.0),
>         ('ACorp', 'Region 1', 'Q4', 26000.0),
>         ('ACorp', 'Region 2', 'Q1', 23000.0),
>         ('ACorp', 'Region 2', 'Q2', 20000.0),
>         ('ACorp', 'Region 2', 'Q3', 22000.0),
>         ('ACorp', 'Region 2', 'Q4', 21000.0),
>         ('ACorp', 'Region 3', 'Q1', 26000.0),
>         ('ACorp', 'Region 3', 'Q2', 23000.0),
>         ('ACorp', 'Region 3', 'Q3', 29000.0),
>         ('ACorp', 'Region 3', 'Q4', 27000.0),
>         ('BCorp', 'Region 1', 'Q1', 20000.0),
>         ('BCorp', 'Region 1', 'Q2', 20000.0),
>         ('BCorp', 'Region 1', 'Q3', 24000.0),
>         ('BCorp', 'Region 1', 'Q4', 24000.0),
>         ('BCorp', 'Region 2', 'Q1', 21000.0),
>         ('BCorp', 'Region 2', 'Q2', 21000.0),
>         ('BCorp', 'Region 2', 'Q3', 22000.0),
>         ('BCorp', 'Region 2', 'Q4', 29000.0),
>         ('BCorp', 'Region 3', 'Q1', 28000.0),
>         ('BCorp', 'Region 3', 'Q2', 25000.0),
>         ('BCorp', 'Region 3', 'Q3', 22000.0),
>         ('BCorp', 'Region 3', 'Q4', 21000.0)],
>        dtype=[('company', '|S5'), ('region', '|S8'), ('quarter', '|
> S2'), ('income', '<f8')])
>
> In [4]: p.pivot(a, 'company', 'region', 'income', p.psum)
> ######## Summary by company and region ##########
> cols:['ACorp' 'BCorp']
> rows:['Region 1' 'Region 2' 'Region 3']
> [[  89000.   88000.]
> [  86000.   93000.]
> [ 105000.   96000.]]
>
> In [5]: p.pivot(a, 'company', 'quarter', 'income', p.psum)
> ######## Summary by company and quarter ##########
> cols:['ACorp' 'BCorp']
> rows:['Q1' 'Q2' 'Q3' 'Q4']
> [[ 69000.  69000.]
> [ 65000.  66000.]
> [ 72000.  68000.]
> [ 74000.  74000.]]
>
> In [6]: p.pivot(a, 'company', 'quarter', 'income', p.pmean)
> ######## Summary by company and quarter ##########
> cols:['ACorp' 'BCorp']
> rows:['Q1' 'Q2' 'Q3' 'Q4']
> [[ 23000.          23000.        ]
> [ 21666.66666667  22000.        ]
> [ 24000.          22666.66666667]
> [ 24666.66666667  24666.66666667]]
>
>  testpivot.py
> 3KDownload
>
>
>
> On Aug 1, 2007, at 2:02 PM, Bruce Southey wrote:
>
> > Hi,
> > The hard part is knowing what aggregate function that you want. So a
> > hard way, even after cheating, to take the data provided is given
> > below. (The Numpy Example List was very useful especially on the where
> > function)!
>
> > I tried to be a little generic so you can replace the sum by any
> > suitable function and probably the array type as well. Of course it is
> > not complete because you still need to know the levels of the 'rows'
> > and 'columns' and also is not efficient as it has loops.
>
> > Bruce
>
> > from numpy import *
> > A=array([[1,1,10],
> >          [1,1,20],
> >          [1,2,30],
> >          [2,1,40],
> >          [2,2,50],
> >          [2,2,60] ])
> > C = zeros((2,2))
>
> > for i in range(2):
> >       crit1 = (A[:,0]==1+i)
> >       subA=A[crit1,1:]
> >       for j in range(2):
> >             crit2 = (subA[:,0]==1+j)
> >             subB=subA[crit2,1:]
> >             C[i,j]=subB.sum()
>
> > print C
>
> > On 7/30/07, Geoffrey Zhu <zyzhu2... at gmail.com> wrote:
> >> Hi Everyone,
>
> >> I am wondering what is the best (and fast) way to build a pivot table
> >> aside from the 'brute force way?'
>
> >> I want to transform an numpy array into a pivot table. For
> >> example, if
> >> I have a numpy array like below:
>
> >> Region     Date          # of Units
> >> ----------    ----------        --------------
> >> East        1/1             10
> >> East        1/1             20
> >> East        1/2             30
> >> West       1/1             40
> >> West       1/2             50
> >> West       1/2             60
>
> >> I want  to transform this into the following table, where f() is a
> >> given aggregate function:
>
> >>            Date
> >> Region           1/1          1/2
> >> ----------
> >> East         f(10,20)         f(30)
> >> West        f(40)             f(50,60)
>
> >> I can regroup them into 'sets' and do it the brute force way, but
> >> that
> >> is kind of slow to execute. Does anyone know a better way?
>
> >> Thanks,
> >> Geoffrey
> >> _______________________________________________
> >> Numpy-discussion mailing list
> >> Numpy-discuss... at scipy.org
> >>http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
> > _______________________________________________
> > Numpy-discussion mailing list
> > Numpy-discuss... at scipy.org
> >http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discuss... at scipy.orghttp://projects.scipy.org/mailman/listinfo/numpy-discussion