Hi Timothy, On 7/30/07, Timothy Hochberg <tim.hochberg@ieee.org> wrote:
On 7/30/07, Geoffrey Zhu <zyzhu2000@gmail.com> wrote:
Hi Everyone,
I am wondering what is the best (and fast) way to build a pivot table aside from the 'brute force way?'
What's the brute force way? It's easier to offer an improved suggestion if we know what we're trying to beat.
I want to transform an numpy array into a pivot table. For example, if I have a numpy array like below:
Region Date # of Units ---------- ---------- -------------- East 1/1 10 East 1/1 20 East 1/2 30 West 1/1 40 West 1/2 50 West 1/2 60
I want to transform this into the following table, where f() is a given aggregate function:
Date Region 1/1 1/2 ---------- East f(10,20) f(30) West f(40) f(50,60)
I can regroup them into 'sets' and do it the brute force way, but that is kind of slow to execute. Does anyone know a better way?
I would use a python to dictionary to assemble lists of values. I would key off (region/date) tuples. In outline:
map = {} dates = set() regions = set() for (region, date, units) in data: dates.add(date) regions.add(regions) key = (region, date) if key not in map: map[key] = [] map[key].append(data)
Once you have map, regions and dates, you can trivially make a table as above. The details will depend on what format you want the table to have, but it should be easy to do.
Thanks, Geoffrey _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- . __ . |-\ . . tim.hochberg@ieee.org _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
The 'brute-force' way is basically what you suggested -- looping through all the records and building a two-way hash-table of the data. The problem of the brute-force' approach is that it is not taking advantage of facilities of numpy and can be slow in speed. If only there is some built-in mechanism in numpy to handle this. The other thing I am not sure is in your map object above, do I append the row number to the numpy array or do I append the row object (such as data[r])? Thanks, Geoffrey