[Numpy-discussion] How to implement a 'pivot table?'
Geoffrey Zhu
zyzhu2000 at gmail.com
Mon Jul 30 15:32:17 EDT 2007
Hi Timothy,
On 7/30/07, Timothy Hochberg <tim.hochberg at ieee.org> wrote:
>
>
> On 7/30/07, Geoffrey Zhu <zyzhu2000 at gmail.com> wrote:
> > Hi Everyone,
> >
> > I am wondering what is the best (and fast) way to build a pivot table
> > aside from the 'brute force way?'
>
> What's the brute force way? It's easier to offer an improved suggestion if
> we know what we're trying to beat.
>
> > I want to transform an numpy array into a pivot table. For example, if
> > I have a numpy array like below:
> >
> > Region Date # of Units
> > ---------- ---------- --------------
> > East 1/1 10
> > East 1/1 20
> > East 1/2 30
> > West 1/1 40
> > West 1/2 50
> > West 1/2 60
> >
> > I want to transform this into the following table, where f() is a
> > given aggregate function:
> >
> > Date
> > Region 1/1 1/2
> > ----------
> > East f(10,20) f(30)
> > West f(40) f(50,60)
> >
> >
> > I can regroup them into 'sets' and do it the brute force way, but that
> > is kind of slow to execute. Does anyone know a better way?
>
> I would use a python to dictionary to assemble lists of values. I would key
> off (region/date) tuples. In outline:
>
> map = {}
> dates = set()
> regions = set()
> for (region, date, units) in data:
> dates.add(date)
> regions.add(regions)
> key = (region, date)
> if key not in map:
> map[key] = []
> map[key].append(data)
>
> Once you have map, regions and dates, you can trivially make a table as
> above. The details will depend on what format you want the table to have,
> but it should be easy to do.
>
>
> > Thanks,
> > Geoffrey
> > _______________________________________________
> > Numpy-discussion mailing list
> > Numpy-discussion at scipy.org
> >
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
> >
>
>
>
> --
> . __
> . |-\
> .
> . tim.hochberg at ieee.org
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
The 'brute-force' way is basically what you suggested -- looping
through all the records and building a two-way hash-table of the data.
The problem of the brute-force' approach is that it is not taking
advantage of facilities of numpy and can be slow in speed. If only
there is some built-in mechanism in numpy to handle this.
The other thing I am not sure is in your map object above, do I append
the row number to the numpy array or do I append the row object (such
as data[r])?
Thanks,
Geoffrey
More information about the NumPy-Discussion
mailing list