[Numpy-discussion] How to implement a 'pivot table?'

Mon Jul 30 15:32:17 EDT 2007

Hi Timothy,

On 7/30/07, Timothy Hochberg <tim.hochberg at ieee.org> wrote:
>
>
> On 7/30/07, Geoffrey Zhu <zyzhu2000 at gmail.com> wrote:
> > Hi Everyone,
> >
> > I am wondering what is the best (and fast) way to build a pivot table
> > aside from the 'brute force way?'
>
> What's the brute force way? It's easier to offer an improved suggestion if
> we know what we're trying to beat.
>
> > I want to transform an numpy array into a pivot table. For example, if
> > I have a numpy array like below:
> >
> > Region     Date          # of Units
> > ----------    ----------        --------------
> > East        1/1             10
> > East        1/1             20
> > East        1/2             30
> > West       1/1             40
> > West       1/2             50
> > West       1/2             60
> >
> > I want  to transform this into the following table, where f() is a
> > given aggregate function:
> >
> >            Date
> > Region           1/1          1/2
> > ----------
> > East         f(10,20)         f(30)
> > West        f(40)             f(50,60)
> >
> >
> > I can regroup them into 'sets' and do it the brute force way, but that
> > is kind of slow to execute. Does anyone know a better way?
>
> I would use a python to dictionary to assemble lists of values. I would key
> off (region/date) tuples. In outline:
>
> map = {}
> dates = set()
> regions = set()
> for (region, date, units) in data:
>     dates.add(date)
>     regions.add(regions)
>     key = (region, date)
>     if key not in map:
>          map[key] = []
>     map[key].append(data)
>
> Once you have map, regions and dates, you can trivially make a table as
> above.  The details will depend on what format you want the table to have,
> but it should be easy to do.
>
>
> > Thanks,
> > Geoffrey
> > _______________________________________________
> > Numpy-discussion mailing list
> > Numpy-discussion at scipy.org
> >
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
> >
>
>
>
> --
> .  __
> .   |-\
> .
> .  tim.hochberg at ieee.org
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion

The 'brute-force' way is basically what you suggested -- looping
through all the records and building a two-way hash-table of the data.

The problem of the brute-force' approach is that it is not taking
advantage of facilities of numpy and can be slow in speed. If only
there is some built-in mechanism in numpy to handle this.

The other thing I am not sure is in your map object above, do I append
the row number to the numpy array or do I append the row object (such
as data[r])?

Thanks,
Geoffrey