Re: [Numpy-discussion] How to implement a 'pivot table?'

31 Jul 2007

      Hi Timothy,

On 7/31/07, Timothy Hochberg <tim.hochberg@ieee.org> wrote:
...
[SNIP]
...
The 'brute-force' way is basically what you suggested -- looping
through all the records and building a two-way hash-table of the data.
The problem of the brute-force' approach is that it is not taking
advantage of facilities of numpy and can be slow in speed. If only
there is some built-in mechanism in numpy to handle this.
I'm curious; have you tried this and found it slow, or is this a hunch based
on the reputed slowness of Python? Algorithmically, you can't do much
better: the dictionary and set operations are O(1), so the whole operation
is O(N), and you won't do any better than that, order wise. What your left
with is trying to reduce constant factors.
I agree that algorithmically you can't do much better. It is basically
a C vs Python thing. One advantage of numpy is that you can do
vectorized operations at the speed of C. With this algorithm, we have
to process the data element by element and the speed advantage of
numpy is lost. Since data has to be stored in python sets and maps, I
imagine the storage advantage is also lost.

I was in fact looking for some implemention of this algorithm in numpy
(and so C) that does exactly this, or some implementation of this
algorithm that can leverage the fast numpy routines to do this.

I haven't tried it with the real data load yet. I know the number of
records will be huge and it is just a hunch that it will be slow.
...
There are various ways one might go about reducing constant factors, but
they depend on the details of the problem. For example, if the dates are
dense and you are going to parse them anyway, you could replace the hash
with table that you index into with the date as an integer. I'm not sure
that you are going to do a lot better than the brute force algorithm in the
generic force case though.
Unfortunately it has to be something generic.

Thanks a lot for your help.
Geoffrey

Re: [Numpy-discussion] How to implement a 'pivot table?'

Geoffrey Zhu